A Little Noise

August 30, 2022

Latitude and Longitude swapped after upgrading MySQL

Filed under: MySQL — snoyes @ 11:49 am
Sideways Globe

MySQL supports geometry types. The basic type is the POINT, and from that you can build linestrings and polygons and various combinations of those.

Everybody wants to store latitude and longitude, and since those are two numbers representing a point on the planet, they seem like a natural fit to store in a POINT type.

However, here there be dragons. Coordinates on a flat piece of paper are different than on a sphere. On a flat piece of paper, (-180, 0) and (180, 0) are 360 apart. On a sphere, they are the same point. How is MySQL supposed to know which to use when you ask for ST_Distance?

The answer is to specify a spatial reference identifier (SRID). A popular SRID with latitude and longitude on our planet is 4326.

MySQL 5.7 acts like everything is a flat sheet of paper, even if you specify SRID 4326. MySQL 5.7 doesn’t know what ‘latitude’ and ‘longitude’ mean, but the effect is that the longitude is the first coordinate (X), the latitude is the second coordinate (Y). Since latitude and longitude are angle measures rather than linear distances, ST_Distance only kinda works for points that are quite close to each other and far from the international date line and the poles.

SET @point180W = 0xE6100000010100000000000000008066C00000000000000000;
SET @point180E = 0xE6100000010100000000000000008066400000000000000000;
ST_AsText(@point180W)   ST_SRID(@point180W)
POINT(-180 0)           4326
ST_AsText(@point180E)   ST_SRID(@point180E)
POINT(180 0)            4326
ST_Distance(@point180W, @point180E)
360

In 8.0, MySQL learned about SRIDs. For SRID 4326, MySQL says the first coordinate (X) is latitude, the second (Y) is longitude.

ST_AsText(@point180W)   ST_SRID(@point180W)
POINT(0 -180)           4326
ST_AsText(@point180E)   ST_SRID(@point180E)
POINT(0 180)            4326
ST_Distance(@point180W, @point180E)
0

If you had points in 5.7 with SRID 0 (meaning no SRID), then they should still work the same in 8.0, like everything is a flat piece of paper.

But, if you had points in 5.7 with SRID 4326, you’ll find that they are backwards from what you expect after you upgrade to 8.0. The effect is that 5.7 seemed like it stored them in long/lat, but really the underlying storage has not changed; 5.7 just didn’t bother to check which one was supposed to be which.

If you have upgraded to 8.0 and find that your points seem to be “backwards”, you can take any of several approaches:

  1. Leave it alone. 5.7 was wrong. 8.0 is now consistent with EPSG:4326. Adjust your expectations and queries accordingly.
  2. Since 5.7 behaved as if there was no SRID for everything except the ST_SRID function, you can update your data and remove the SRID.
    Since columns in 8.0 have an SRID associated, you can’t just do:
    UPDATE table SET column = ST_SRID(column, 0);
    You’ll get ERROR 3643 (HY000): The SRID of the geometry does not match the SRID of the column … Consider changing the SRID of the geometry or the SRID property of the column.
    First ALTER TABLE table MODIFY column LONGBLOB;
    Then use the above UPDATE to remove the SRID. Finally, use another ALTER TABLE to modify the column back to the appropriate geometry type with SRID 0.
  3. 5.7 had the effect of treating lat/long as long/lat. If you populated your columns with that expectation, then you can swap them:
    UPDATE table SET column = ST_SwapXY(column);
    This will take the Statue of Liberty out of central Greenland and put it back in New York where it belongs.

April 11, 2022

Date ranges in JSON

Filed under: MySQL — snoyes @ 9:50 am

Suppose you have these documents stored in a JSON column:

CREATE TABLE tableName (id int auto_increment primary key, data JSON);
INSERT INTO tableName (data) VALUES ('
{
    "2022/03/29": ["macguffin", "plot coupon"], 
    "2022/03/31": ["lorem", "ipsum"], 
    "2022/04/01": ["dolor", "sit amet"], 
    "2022/04/02": ["silent engine", "radioactive ore"], 
    "2022/04/03": ["pulpy briefcase", "Schmilblick"]
}
'), ('
{
    "2022/03/30": ["Big Dumb Object", "Unobtanium"], 
    "2022/03/31": "glowing orb", 
    "2022/04/02": "blinker fluid",
    "2022/04/03": "hunted snipe",
    "2022/04/04": ["wild goose", "alien space bat"]
}
');

And suppose you want the values for each row from the range 29 March through 01 April.

JSON has no idea what a “date” is, nor how to cover a “range” of values. But MySQL does. So, we can turn the keys into something that MySQL can work with, filter those keys with SQL, and then look up the values in JSON again.

SELECT
 JSON_ARRAYAGG(
  JSON_EXTRACT(data, CONCAT('$."', DATE_FORMAT(d, '%Y/%m/%d'), '"'))
 ) AS importantValues 
FROM
 tableName
 JOIN JSON_TABLE(
    REPLACE(JSON_KEYS(data), '/', '-'),
    "$[*]" COLUMNS (d DATE PATH "$")
 ) AS jt 
WHERE d BETWEEN '2022-03-29' AND '2022-04-01'
GROUP BY id;
+---------------------------------------------------------------------------+
| importantValues                                                           |
+---------------------------------------------------------------------------+
| [["macguffin", "plot coupon"], ["lorem", "ipsum"], ["dolor", "sit amet"]] |
| [["Big Dumb Object", "Unobtanium"], "glowing orb"]                        |
+---------------------------------------------------------------------------+

Let’s break that down.

JSON_KEYS gets an array of keys from a JSON object:

SELECT JSON_KEYS(data) FROM tableName;
+------------------------------------------------------------------------+
| JSON_KEYS(data)                                                        |
+------------------------------------------------------------------------+
| ["2022/03/29", "2022/03/31", "2022/04/01", "2022/04/02", "2022/04/03"] |
| ["2022/03/30", "2022/03/31", "2022/04/02", "2022/04/03", "2022/04/04"] |
+------------------------------------------------------------------------+

At the moment, MySQL still understands any delimiter character between the date parts, in this case a forward slash. However, as of 8.0.29, this allowance is deprecated, so to be proper the slashes must be replaced with dashes:

SELECT REPLACE(JSON_KEYS(data), '/', '-') FROM tableName;
+------------------------------------------------------------------------+
| REPLACE(JSON_KEYS(data), '/', '-')                                     |
+------------------------------------------------------------------------+
| ["2022-03-29", "2022-03-31", "2022-04-01", "2022-04-02", "2022-04-03"] |
| ["2022-03-30", "2022-03-31", "2022-04-02", "2022-04-03", "2022-04-04"] |
+------------------------------------------------------------------------+

JSON_TABLE turns JSON data into a table. The “$[*]” path selects the part of the expression that will be turned into the table. The COLUMNS section turns those values into MySQL dates.

SELECT * FROM 
 JSON_TABLE(
    '["2022/03/29", "2022/03/31", "2022/04/01", "2022/04/02", "2022/04/03"]',
    "$[*]" COLUMNS (d DATE PATH "$")
 ) AS jt;
+------------+
| d          |
+------------+
| 2022-03-29 |
| 2022-03-31 |
| 2022-04-01 |
| 2022-04-02 |
| 2022-04-03 |
+------------+

You can provide a column name to JSON_TABLE, provided it comes from a table listed earlier in the FROM clause. Filtering and grouping is ordinary SQL. JSON_ARRAYGG is an aggregate function that turns rows into a JSON array; it’s the JSON version of GROUP_CONCAT.

SELECT
 id, JSON_ARRAYAGG(d)
FROM
 tableName
 JOIN JSON_TABLE(
    JSON_KEYS(data),
    "$[*]" COLUMNS (d DATE PATH "$")
 ) AS jt 
WHERE d BETWEEN '2022-03-29' AND '2022-04-01'
GROUP BY id;

JSON_EXTRACT returns the values from a given path. It requires an exact match for looking up keys, so those dashes must be turned back into slashes – REPLACE would work, but DATE_FORMAT is more obvious. It also needs the key wrapped in double quotes, and prepend “$.” to make it into a proper path.

May 6, 2020

JSON in MySQL: Keys which do NOT match a value (the more elegant way)

Filed under: MySQL — snoyes @ 12:33 pm

Given a JSON object:
SET @j := '{"BOOSTER": 1, "RETRO": 2, "GUIDANCE": 0, "SURGEON": 1, "RECOVERY": 0}';

How do you find the keys which do NOT contain a value of 0?

A second approach to finding the non-0 values from a JSON object is to turn the keys and values into separate columns of a table. This is cleaner, but a little wordier.

We’ll get the keys in one table:

SELECT ord, keyname 
FROM JSON_TABLE(JSON_KEYS(@j), '$[*]' COLUMNS (
    ord FOR ORDINALITY,
    keyname VARCHAR(100) PATH '$')
) AS keyTable;
 +------+----------+
 | ord  | keyname  |
 +------+----------+
 |    1 | RETRO    |
 |    2 | BOOSTER  |
 |    3 | SURGEON  |
 |    4 | GUIDANCE |
 |    5 | RECOVERY |
 +------+----------+

And the values in a second table:

SELECT ord, keyvalue
FROM JSON_TABLE(@j, '$.*' COLUMNS (
    ord FOR ORDINALITY,
    keyvalue INT PATH '$')
) AS valueTable;
+------+----------+
| ord  | keyvalue |
+------+----------+
|    1 | 2        |
|    2 | 1        |
|    3 | 1        |
|    4 | 0        |
|    5 | 0        |
+------+----------+

And then join those two together:

SELECT keyname, keyvalue
 FROM JSON_TABLE(JSON_KEYS(@j), '$[*]' COLUMNS (
    ord FOR ORDINALITY,
    keyname VARCHAR(100) PATH '$')) AS keyTable
 JOIN JSON_TABLE(@j, '$.*' COLUMNS (
    ord FOR ORDINALITY,
    keyvalue INT PATH '$')
) AS valueTable USING (ord);
 +----------+----------+
 | keyname  | keyvalue |
 +----------+----------+
 | RETRO    | 2        |
 | BOOSTER  | 1        |
 | SURGEON  | 1        |
 | GUIDANCE | 0        |
 | RECOVERY | 0        |
 +----------+----------+

Now you can use ordinary operations like a WHERE clause to do your filtering, and then JSON_ARRAYAGG() to put them all back into JSON at the end.

SELECT JSON_ARRAYAGG(keyname) AS j
 FROM JSON_TABLE(JSON_KEYS(@j), '$[*]' COLUMNS (
    ord FOR ORDINALITY,
    keyname VARCHAR(100) PATH '$')) AS keyTable
 JOIN JSON_TABLE(@j, '$.*' COLUMNS (
    ord FOR ORDINALITY,
    keyvalue INT PATH '$')
) AS valueTable USING (ord)
 WHERE keyvalue != 0;
 +---------------------------------+
 | j                               |
 +---------------------------------+
 | ["RETRO", "BOOSTER", "SURGEON"] |
 +---------------------------------+

May 5, 2020

JSON in MySQL: Keys which do NOT match a value

Filed under: MySQL — snoyes @ 12:18 pm

Given a JSON object:
SET @j := '{"BOOSTER": 1, "RETRO": 2, "GUIDANCE": 0, "SURGEON": 1, "RECOVERY": 0}';

How do you find the keys which do NOT contain a value of 0?

One approach is to find the keys which do match 0, and then remove them.

Sounds like JSON_SEARCH and JSON_REMOVE? But there’s a gotcha: JSON_SEARCH works only with strings. Ok, so we REPLACE(@j, ‘0’, ‘”0″‘) – but that doesn’t help, because JSON_REMOVE can’t accept an array of paths like JSON_SEARCH would return; it requires each path as a separate parameter.

Instead, JSON_MERGE_PATCH will suffice. It has the feature that values from the second parameter are preserved only if they are not NULL. So, the approach becomes turning the 0s into NULLs, and then merging that document into an empty one.

SELECT REPLACE(@j, 0, 'null') AS j; -- case matters! null, not NULL
+------------------------------------------------------------------------------+
| j                                                                            |
+------------------------------------------------------------------------------+
| {"BOOSTER": 1, "RETRO": 2, "GUIDANCE": null, "SURGEON": 1, "RECOVERY": null} |
+------------------------------------------------------------------------------+

SELECT JSON_MERGE_PATCH('{}', REPLACE(@j, 0, 'null')) AS j;
+------------------------------------------+
| j                                        |
+------------------------------------------+
| {"RETRO": 2, "BOOSTER": 1, "SURGEON": 1} |
+------------------------------------------+

And since I technically asked for just the keys, not the key/value pairs,

SELECT JSON_KEYS(JSON_MERGE_PATCH('{}', REPLACE(@j, 0, 'null'))) AS j;
+---------------------------------+
| j                               |
+---------------------------------+
| ["RETRO", "BOOSTER", "SURGEON"] |
+---------------------------------+

But, there’s a cleaner way.

January 9, 2020

Aggregate JSON arrays

Filed under: MySQL — snoyes @ 11:03 am

Given: a table with JSON arrays

CREATE TABLE t (id int auto_increment primary key, d json);
INSERT INTO t VALUES (1, '["apple", "apple", "orange"]');
INSERT INTO t VALUES (2, '["apple", "banana", "orange", "orange", "orange"]');

The desired output is each row with a count of the unique objects:

+------+----------------------------------------+
| id   | fruitCount                             |
+------+----------------------------------------+
|    1 | {"apple": 2, "orange": 1}              |
|    2 | {"apple": 1, "banana": 1, "orange": 3} |
+------+----------------------------------------+

JSON_TABLE() can transform the array into rows.

SELECT id, fruit
FROM t,
JSON_TABLE(d,
    "$[*]" COLUMNS (
        fruit VARCHAR(100) PATH "$"
    )
) AS dt;

+----+--------+
| id | fruit  |
+----+--------+
|  1 | apple  |
|  1 | apple  |
|  1 | orange |
|  2 | apple  |
|  2 | banana |
|  2 | orange |
|  2 | orange |
|  2 | orange |
+----+--------+

(If you got an error there about either permissions or an unknown table/column in the JSON_TABLE, upgrade to at least 8.0.14 to get past some bugs.)

We can then use the ordinary aggregation functions:

SELECT
    id,
    fruit,
    COUNT(*)
FROM t,
JSON_TABLE(d,
    "$[*]" COLUMNS (
        fruit VARCHAR(100) PATH "$"
    )
) AS dt
GROUP BY id, fruit;

+----+--------+----------+
| id | fruit  | COUNT(*) |
+----+--------+----------+
|  1 | apple  |        2 |
|  1 | orange |        1 |
|  2 | apple  |        1 |
|  2 | banana |        1 |
|  2 | orange |        3 |
+----+--------+----------+

With a subquery or a common table expression, we can aggregate individual rows at a time (meaning you can also select other columns from t). Then JSON_OBJECTAGG will combine those results into a single object.

SELECT
    id,
    (
        WITH cte AS (
            SELECT fruit, COUNT(*) AS c
            FROM JSON_TABLE(d, 
                "$[*]" COLUMNS (
                    fruit VARCHAR(100) PATH "$"
                )
            ) AS dt
            GROUP BY fruit
        )
        SELECT JSON_OBJECTAGG(fruit, c) FROM cte
    ) AS fruitCount
FROM t;

+------+----------------------------------------+
| id   | fruitCount                             |
+------+----------------------------------------+
|    1 | {"apple": 2, "orange": 1}              |
|    2 | {"apple": 1, "banana": 1, "orange": 3} |
+------+----------------------------------------+

et voilà

December 13, 2019

Using map with class methods – why map(split) doesn’t work

Filed under: Python,Technical — snoyes @ 11:19 am

I have some sentences.

text = [
  "Call me Ishmael.",
  "Some years ago, never mind how long precisely, having little or no money..."
]

How many words are in each sentence?

for sentence in text:
  sentenceLength = len(sentence.split())
  print(sentenceLength)

3
13

But I want to do it all at once the functional programming way, with maps.

list(map(len, map(split, text)))

NameError: name 'split' is not defined

Why does that produce an error? Because “split” isn’t a function. It’s a method of strings: str.split(), not split(str).

So how do we use map with a class method?

from operator import methodcaller
split = methodcaller("split")

That means, “Create a function. I’ll pass in an object. Call its ‘split’ method.”

Now it works.

list(map(len, map(split, text)))
[3, 13]

Of course, there are other ways. Don’t even need map.

[len(s.split()) for s in text]

November 8, 2018

Window Functions with Unusual Boundaries

Filed under: MySQL — snoyes @ 12:44 pm

Somebody on Freenode wanted this:

   Source            Result
+----+------+    +----+------+
| id | x    |    | id | c    |
+----+------+    +----+------+
|  1 |    1 |    |  1 |    2 |
|  2 |    1 |    |  2 |    2 |
|  3 | NULL |    |  3 | NULL |
|  4 | NULL | -> |  4 | NULL |
|  5 |    1 |    |  5 |    1 |
|  6 | NULL |    |  6 | NULL |
|  7 |    1 |    |  7 |    3 |
|  9 |    1 |    |  9 |    3 |
| 10 |    1 |    | 10 |    3 |
+----+------+    +----+------+

The result uses the NULL values in x as boundaries of windows, and counts the number of rows within each window. I don’t know why anyone wants such a thing; it is not ours to reason why…

Anyway, the point is that you can use arbitrary expressions, even subqueries, to define your window partitions.

SELECT 
    id, 
    -- Count of rows in windows bound by NULL values in x
    IF(
      x IS NULL, 
      NULL, 
      COUNT(*) OVER (PARTITION BY (
        -- Partition by the number of earlier NULLs
        SELECT COUNT(*) FROM t AS t1 
        WHERE t1.id < t.id AND t1.x IS NULL
        ), 
        -- Exclude the end boundary "NULL" from the window
        x IS NULL
        ORDER BY id
      )
    ) AS c 
FROM t;

How does it work?

First, let's see what that subquery is all about:

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p 
FROM t;
+----+------+---+
| id | x    | p |
+----+------+---+
|  1 |    1 | 0 |
|  2 |    1 | 0 |
|  3 | NULL | 0 |
|  4 | NULL | 1 |
|  5 |    1 | 2 |
|  6 | NULL | 2 |
|  7 |    1 | 3 |
|  9 |    1 | 3 |
| 10 |    1 | 3 |
+----+------+---+

By counting the number of "NULL rows" appearing earlier in the table, we get a value we can use to find the starting point of each window.

That alone goes one row too far, though - the "NULL row" which should end each window gets included in the window. However, you can use multiple expressions to partition windows.

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p1, 
x IS NULL AS p2 
FROM t;
+----+------+------+----+
| id | x    | p1   | p2 |
+----+------+------+----+
|  1 |    1 |    0 |  0 |
|  2 |    1 |    0 |  0 |
|  3 | NULL |    0 |  1 |
|  4 | NULL |    1 |  1 |
|  5 |    1 |    2 |  0 |
|  6 | NULL |    2 |  1 |
|  7 |    1 |    3 |  0 |
|  9 |    1 |    3 |  0 |
| 10 |    1 |    3 |  0 |
+----+------+------+----+

The combination of (p1, p2) neatly partition the rows, so each "NULL row" is by itself, and non-NULL rows are together.

November 4, 2017

GTID_INTERSECT

Filed under: MySQL — snoyes @ 4:33 pm

There’s a GTID_SUBTRACT function, and the manual shows how to write your own cheap GTID_UNION, but not a GTID_INTERSECT. Fortunately it’s easy to write your own, as it’s just subtracting twice.

CREATE FUNCTION GTID_INTERSECT(g1 TEXT, g2 TEXT)
RETURNS TEXT DETERMINISTIC
RETURN GTID_SUBTRACT(g1, GTID_SUBTRACT(g1, g2));

What use is it?

SET @slave_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681577,
421d139e-04b9-11e7-b702-0050569935dc:1-13764443,
52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1267098:1267100-1267416:1267418-1589733';

SET @master_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-730294,
421d139e-04b9-11e7-b702-0050569935dc:1-13764443,
52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1589733';

SET @master_purged = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681582,
421d139e-04b9-11e7-b702-0050569935dc:1-13077260,
52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1589733';

SELECT @slave_needs := GTID_SUBTRACT(@master_executed, @slave_executed);
/*-----------------------------------------------------+
| 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-730294, |
| 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |
+-----------------------------------------------------*/

SELECT @slave_will_not_get := GTID_INTERSECT(@master_purged, @slave_needs);
/*-----------------------------------------------------+
| 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-681582, |
| 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |
+-----------------------------------------------------*/

While we’re on the subject, the manual’s GTID_UNION just concatenates the two sets, which means the output will list values twice. A small improvement will clean that up:

CREATE FUNCTION GTID_UNION(g1 TEXT, g2 TEXT)
RETURNS TEXT DETERMINISTIC
RETURN GTID_SUBTRACT(CONCAT(g1,',',g2), '');

July 18, 2017

ERROR 3037 (22023): Invalid GIS data provided to function st_geometryfromtext.

Filed under: MySQL — snoyes @ 1:42 pm

1. Watch the parentheses. It’s:

ST_GeomFromText('POLYGON((outerRing), (innerRing), (innerRing), ...)')

The inner rings are optional. If you have just the outer ring, then it’s still:

ST_GeomFromText('POLYGON((outerRing))')

and not:

ST_GeomFromText('POLYGON(outerRing)')

2. Polygons have to start and end at the same point.

3. Watch the commas. Rings are comma-delimited sets of of whitespace-delimited coordinate pairs:

x1 y1, x2 y2, x3 y3, x4 y4

not:
x1 y1 x2 y2 x3 y3
x1, y1, x2, y2, x3, y3, x4, y4
(x1, y1), (x2, y2), (x3, y3)
or other variations on that theme.

July 11, 2017

On slave_parallel_workers and the logical clock

Filed under: MySQL — snoyes @ 10:36 am

How can you tell if a given workload on the master could be replicated with many parallel workers on the slave?

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock. The implementation is described in WL #7165 – including a neat little ASCII-art graphic.

Each event in the binary log is stamped with two values:
– The sequence_number increments for each commit
– The last_committed is the sequence_number which was in effect when this transaction entered the prepare phase.

A slave worker cannot begin doing a transaction until the last_committed value is smaller than the sequence_number of all other running threads.

mysqlbinlog mysql-bin.0000x | grep last_committed

#160118 15:31:34 server id 3  end_log_pos 1527 CRC32 0xcdf6bd8d         GTID    last_committed=0        sequence_number=1
#160118 15:31:34 server id 3  end_log_pos 2627 CRC32 0x72e5fa80         GTID    last_committed=1        sequence_number=2
#160119  9:59:17 server id 3  end_log_pos 2952 CRC32 0x7290d02f         GTID    last_committed=2        sequence_number=3
#160119  9:59:17 server id 3  end_log_pos 3248 CRC32 0x227fe513         GTID    last_committed=3        sequence_number=4
#160119  9:59:17 server id 3  end_log_pos 3544 CRC32 0x8a6a2b78         GTID    last_committed=4        sequence_number=5
#160119  9:59:17 server id 3  end_log_pos 5315 CRC32 0x35bca421         GTID    last_committed=4        sequence_number=6
#160119  9:59:17 server id 3  end_log_pos 7094 CRC32 0x75584a42         GTID    last_committed=4        sequence_number=7
#160119  9:59:17 server id 3  end_log_pos 8865 CRC32 0xe4ee1b54         GTID    last_committed=6        sequence_number=8
#160119  9:59:17 server id 3  end_log_pos 10636 CRC32 0xed1f7c48        GTID    last_committed=6        sequence_number=9
#160119  9:59:17 server id 3  end_log_pos 12409 CRC32 0x7802f625        GTID    last_committed=9        sequence_number=10

In the first transaction, last_committed=0, sequence_number=1
The first worker thread takes that transaction and goes to work.

In the second transaction, last_committed=1, sequence_number=2
This transaction cannot start until the first one finishes, because last_committed=1 is not smaller than the previous sequence_number=1. So the first two transactions must be done in series.

In the third transaction, last_committed=2, sequence_number=3. Again, no worker could start this transaction until that previous trx with sequence_number=2 finishes.

Although these first four transactions might actually be assigned to different workers, effectively they execute in series, just like single-threaded replication would work.

Now, we come to the transactions with last_committed=4. As soon as that transaction with sequence_number=4 finishes, these next three transactions can go in parallel.

last_committed=4        sequence_number=5
last_committed=4        sequence_number=6
last_committed=4        sequence_number=7

Once the first two finish, two more can start:

last_committed=6        sequence_number=8
last_committed=6        sequence_number=9

They can run even while the one with sequence_number=7 is still running.

But this one has to wait for everything before it:

last_committed=9        sequence_number=10

We can see that for these 10 transactions, only three workers could ever be operating at the same time.

You might have spotted a pattern: if the difference (sequence_number – last_committed) = 1, then the transaction has to wait for everything before it to finish. If the difference is 2, then the transactions can run in parallel with just the previous transaction, and so on.

You can count how much of your transaction history falls into each category:

mysqlbinlog mysql-bin.0000x | grep -o 'last_committed.*' | sed 's/=\|\s/ /g' | awk '{print $4-$2}' | sort -g | uniq -c

 20953 1
  8998 2
  6021 3
  4209 4
  3083 5
  2180 6
  1403 7
   820 8
   418 9
   176 10
     7 11
     2 12
     1 13
     1 15
     1 16

In that binary log, 20953 transactions had to wait for all previous transactions to commit before starting. 8998 could run in parallel with one previous transaction. Only a very few could run in parallel with more than 10 previous transactions. Jut 3 worker threads would cover 75% of the possible parallel transactions. 8 worker threads would cover 99%.

See also Percona’s blog for discussion on how to view similar information via the performance_schema on a running slave.

Older Posts »

Powered by WordPress