A Little Noise

11Jul/171

On slave_parallel_workers and the logical clock

How can you tell if a given workload on the master could be replicated with many parallel workers on the slave?

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock. The implementation is described in WL #7165 - including a neat little ASCII-art graphic.

Each event in the binary log is stamped with two values:
- The sequence_number increments for each commit
- The last_committed is the sequence_number which was in effect when this transaction entered the prepare phase.

A slave worker cannot begin doing a transaction until the last_committed value is smaller than the sequence_number of all other running threads.

mysqlbinlog mysql-bin.0000x | grep last_committed

#160118 15:31:34 server id 3  end_log_pos 1527 CRC32 0xcdf6bd8d         GTID    last_committed=0        sequence_number=1
#160118 15:31:34 server id 3  end_log_pos 2627 CRC32 0x72e5fa80         GTID    last_committed=1        sequence_number=2
#160119  9:59:17 server id 3  end_log_pos 2952 CRC32 0x7290d02f         GTID    last_committed=2        sequence_number=3
#160119  9:59:17 server id 3  end_log_pos 3248 CRC32 0x227fe513         GTID    last_committed=3        sequence_number=4
#160119  9:59:17 server id 3  end_log_pos 3544 CRC32 0x8a6a2b78         GTID    last_committed=4        sequence_number=5
#160119  9:59:17 server id 3  end_log_pos 5315 CRC32 0x35bca421         GTID    last_committed=4        sequence_number=6
#160119  9:59:17 server id 3  end_log_pos 7094 CRC32 0x75584a42         GTID    last_committed=4        sequence_number=7
#160119  9:59:17 server id 3  end_log_pos 8865 CRC32 0xe4ee1b54         GTID    last_committed=6        sequence_number=8
#160119  9:59:17 server id 3  end_log_pos 10636 CRC32 0xed1f7c48        GTID    last_committed=6        sequence_number=9
#160119  9:59:17 server id 3  end_log_pos 12409 CRC32 0x7802f625        GTID    last_committed=9        sequence_number=10

In the first transaction, last_committed=0, sequence_number=1
The first worker thread takes that transaction and goes to work.

In the second transaction, last_committed=1, sequence_number=2
This transaction cannot start until the first one finishes, because last_committed=1 is not smaller than the previous sequence_number=1. So the first two transactions must be done in series.

In the third transaction, last_committed=2, sequence_number=3. Again, no worker could start this transaction until that previous trx with sequence_number=2 finishes.

Although these first four transactions might actually be assigned to different workers, effectively they execute in series, just like single-threaded replication would work.

Now, we come to the transactions with last_committed=4. As soon as that transaction with sequence_number=4 finishes, these next three transactions can go in parallel.

last_committed=4        sequence_number=5
last_committed=4        sequence_number=6
last_committed=4        sequence_number=7

Once the first two finish, two more can start:

last_committed=6        sequence_number=8
last_committed=6        sequence_number=9

They can run even while the one with sequence_number=7 is still running.

But this one has to wait for everything before it:

last_committed=9        sequence_number=10

We can see that for these 10 transactions, only three workers could ever be operating at the same time.

You might have spotted a pattern: if the difference (sequence_number - last_committed) = 1, then the transaction has to wait for everything before it to finish. If the difference is 2, then the transactions can run in parallel with just the previous transaction, and so on.

You can count how much of your transaction history falls into each category:

mysqlbinlog mysql-bin.0000x | grep -o 'last_committed.*' | sed 's/=\|\s/ /g' | awk '{print $4-$2}' | sort -g | uniq -c

 20953 1
  8998 2
  6021 3
  4209 4
  3083 5
  2180 6
  1403 7
   820 8
   418 9
   176 10
     7 11
     2 12
     1 13
     1 15
     1 16

In that binary log, 20953 transactions had to wait for all previous transactions to commit before starting. 8998 could run in parallel with one previous transaction. Only a very few could run in parallel with more than 10 previous transactions. Jut 3 worker threads would cover 75% of the possible parallel transactions. 8 worker threads would cover 99%.

See also Percona's blog for discussion on how to view similar information via the performance_schema on a running slave.

Filed under: MySQL Leave a comment