Category: MySQL

Things related to http://www.mysql.com

  • GTID_INTERSECT

    There’s a GTID_SUBTRACT function, and the manual shows how to write your own cheap GTID_UNION, but not a GTID_INTERSECT. Fortunately it’s easy to write your own, as it’s just subtracting twice.

    CREATE FUNCTION GTID_INTERSECT(g1 TEXT, g2 TEXT)
    RETURNS TEXT DETERMINISTIC
    RETURN GTID_SUBTRACT(g1, GTID_SUBTRACT(g1, g2));

    What use is it?

    SET @slave_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681577,
    421d139e-04b9-11e7-b702-0050569935dc:1-13764443,
    52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1267098:1267100-1267416:1267418-1589733';

    SET @master_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-730294,
    421d139e-04b9-11e7-b702-0050569935dc:1-13764443,
    52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1589733';

    SET @master_purged = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681582,
    421d139e-04b9-11e7-b702-0050569935dc:1-13077260,
    52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1589733';

    SELECT @slave_needs := GTID_SUBTRACT(@master_executed, @slave_executed);
    /*-----------------------------------------------------+
    | 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-730294, |
    | 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |
    +-----------------------------------------------------*/

    SELECT @slave_will_not_get := GTID_INTERSECT(@master_purged, @slave_needs);
    /*-----------------------------------------------------+
    | 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-681582, |
    | 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |
    +-----------------------------------------------------*/

    While we’re on the subject, the manual’s GTID_UNION just concatenates the two sets, which means the output will list values twice. A small improvement will clean that up:

    CREATE FUNCTION GTID_UNION(g1 TEXT, g2 TEXT)
    RETURNS TEXT DETERMINISTIC
    RETURN GTID_SUBTRACT(CONCAT(g1,',',g2), '');

  • ERROR 3037 (22023): Invalid GIS data provided to function st_geometryfromtext.

    1. Watch the parentheses. It’s:

    ST_GeomFromText('POLYGON((outerRing), (innerRing), (innerRing), ...)')

    The inner rings are optional. If you have just the outer ring, then it’s still:

    ST_GeomFromText('POLYGON((outerRing))')

    and not:

    ST_GeomFromText('POLYGON(outerRing)')

    2. Polygons have to start and end at the same point.

    3. Watch the commas. Rings are comma-delimited sets of of whitespace-delimited coordinate pairs:

    x1 y1, x2 y2, x3 y3, x4 y4

    not:
    x1 y1 x2 y2 x3 y3
    x1, y1, x2, y2, x3, y3, x4, y4
    (x1, y1), (x2, y2), (x3, y3)
    or other variations on that theme.

  • On slave_parallel_workers and the logical clock

    How can you tell if a given workload on the master could be replicated with many parallel workers on the slave?

    The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock. The implementation is described in WL #7165 – including a neat little ASCII-art graphic.

    Each event in the binary log is stamped with two values:
    – The sequence_number increments for each commit
    – The last_committed is the sequence_number which was in effect when this transaction entered the prepare phase.

    A slave worker cannot begin doing a transaction until the last_committed value is smaller than the sequence_number of all other running threads.

    mysqlbinlog mysql-bin.0000x | grep last_committed

    #160118 15:31:34 server id 3  end_log_pos 1527 CRC32 0xcdf6bd8d         GTID    last_committed=0        sequence_number=1
    #160118 15:31:34 server id 3  end_log_pos 2627 CRC32 0x72e5fa80         GTID    last_committed=1        sequence_number=2
    #160119  9:59:17 server id 3  end_log_pos 2952 CRC32 0x7290d02f         GTID    last_committed=2        sequence_number=3
    #160119  9:59:17 server id 3  end_log_pos 3248 CRC32 0x227fe513         GTID    last_committed=3        sequence_number=4
    #160119  9:59:17 server id 3  end_log_pos 3544 CRC32 0x8a6a2b78         GTID    last_committed=4        sequence_number=5
    #160119  9:59:17 server id 3  end_log_pos 5315 CRC32 0x35bca421         GTID    last_committed=4        sequence_number=6
    #160119  9:59:17 server id 3  end_log_pos 7094 CRC32 0x75584a42         GTID    last_committed=4        sequence_number=7
    #160119  9:59:17 server id 3  end_log_pos 8865 CRC32 0xe4ee1b54         GTID    last_committed=6        sequence_number=8
    #160119  9:59:17 server id 3  end_log_pos 10636 CRC32 0xed1f7c48        GTID    last_committed=6        sequence_number=9
    #160119  9:59:17 server id 3  end_log_pos 12409 CRC32 0x7802f625        GTID    last_committed=9        sequence_number=10

    In the first transaction, last_committed=0, sequence_number=1
    The first worker thread takes that transaction and goes to work.

    In the second transaction, last_committed=1, sequence_number=2
    This transaction cannot start until the first one finishes, because last_committed=1 is not smaller than the previous sequence_number=1. So the first two transactions must be done in series.

    In the third transaction, last_committed=2, sequence_number=3. Again, no worker could start this transaction until that previous trx with sequence_number=2 finishes.

    Although these first four transactions might actually be assigned to different workers, effectively they execute in series, just like single-threaded replication would work.

    Now, we come to the transactions with last_committed=4. As soon as that transaction with sequence_number=4 finishes, these next three transactions can go in parallel.

    last_committed=4        sequence_number=5
    last_committed=4        sequence_number=6
    last_committed=4        sequence_number=7

    Once the first two finish, two more can start:

    last_committed=6        sequence_number=8
    last_committed=6        sequence_number=9

    They can run even while the one with sequence_number=7 is still running.

    But this one has to wait for everything before it:

    last_committed=9        sequence_number=10

    We can see that for these 10 transactions, only three workers could ever be operating at the same time.

    You might have spotted a pattern: if the difference (sequence_number – last_committed) = 1, then the transaction has to wait for everything before it to finish. If the difference is 2, then the transactions can run in parallel with just the previous transaction, and so on.

    You can count how much of your transaction history falls into each category:

    mysqlbinlog mysql-bin.0000x | grep -o 'last_committed.*' | sed 's/=\|\s/ /g' | awk '{print $4-$2}' | sort -g | uniq -c

     20953 1
      8998 2
      6021 3
      4209 4
      3083 5
      2180 6
      1403 7
       820 8
       418 9
       176 10
         7 11
         2 12
         1 13
         1 15
         1 16

    In that binary log, 20953 transactions had to wait for all previous transactions to commit before starting. 8998 could run in parallel with one previous transaction. Only a very few could run in parallel with more than 10 previous transactions. Just 3 worker threads would cover 75% of the possible parallel transactions. 8 worker threads would cover 99%.

    See also Percona’s blog for discussion on how to view similar information via the performance_schema on a running slave.

  • LOAD DATA INFILE into a BIT field

    https://dev.mysql.com/doc/en/load-data.html

    BIT values cannot be loaded using binary notation (for example, b’011010′).

    $ cat test.txt
    
    b'101010'
    0b111000
    
    -----
    
    CREATE TABLE loadTest (b BIT(6));
    
    LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest;
    
    /*--------+------+---------------------------------------+
    | Level   | Code | Message                               |
    +---------+------+---------------------------------------+
    | Warning | 1406 | Data too long for column 'b' at row 1 |
    | Warning | 1406 | Data too long for column 'b' at row 2 |
    +---------+------+--------------------------------------*/
    
    -- Note the wrong values:
    
    SELECT BIN(b) FROM loadTest;
    /*-------+
    | BIN(b) |
    +--------+
    | 111111 |
    | 111111 |
    +-------*/
    
    TRUNCATE loadTest;
    
    LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest (@b) 
      SET b = CAST(CONV(TRIM('''' FROM SUBSTRING_INDEX(@b, 'b', -1)), 2, 10) AS UNSIGNED);
    
    SELECT BIN(b) FROM loadTest;
    /*-------+
    | BIN(b) |
    +--------+
    | 101010 |
    | 111000 |
    +-------*/
    

    Deconstruction:

    SUBSTRING_INDEX(@b, 'b', -1)

    removes the leading ‘b’ or ‘0b’.

    TRIM('''' FROM ...)

    removes the quotes if there are any.

    CONV(..., 2, 10)

    converts the string of 0s and 1s into a string of decimal digits.

    CAST(... AS UNSIGNED)

    turns the string of decimal digits into an integer.
    MySQL automatically casts integers into bits when inserting into a BIT type.

  • Quiz: Drop non-global users

    Somebody asked on Freenode. I don’t know why they wanted it. How would you drop all MySQL users who do not have “GRANT ALL ON *.* … WITH GRANT OPTION”? That is, drop any users who have ‘N’ in any of the privilege columns in `mysql`.`user`.

    My solution shown below. Did you think of a different approach?

    I used SQL to build SQL which built more SQL. Get the list of ‘priv’ columns from information_schema.columns. Build a query that looks for ‘N’ in any of those columns (it feels funny to search for a constant in a list of fields instead of a field in a list of constants, but it’s perfectly legal). Use the results to build the DROP USER statement.

    mysql -BNe "SELECT CONCAT('SELECT CONCAT(''DROP USER '', QUOTE(user), ''@'', QUOTE(Host), '';'') FROM mysql.user WHERE ''N'' IN (', GROUP_CONCAT(column_name), ')') FROM information_schema.columns WHERE table_schema = 'mysql' AND table_name = 'user' AND column_name LIKE '%priv';" | mysql -BN | mysql

  • Debugging Large Data with Rewriter

    A customer showed that a particular client reported a less-than-helpful error message when it tried to display some meta-data about a table.

    A less-than-helpful error message that hints at an int.

    I couldn’t repeat the behavior with just a copy of the schema, so I suspected it was because of the size of data in the customer’s server – somebody had used an int where they needed a long.

    The customer’s data was quite large – many hundreds of GB – more than I could easily whip up on my laptop to test. But, I didn’t really need all that data, or even any data at all; I just needed MySQL to pretend it had all that data. Specifically, I needed information_schema to report a large data_length.

    Enter Rewriter, the query rewrite plugin that ships with MySQL 5.7 and later.

    First, the general query log gave the exact query sent by the client:

    select * FROM information_schema.partitions WHERE TABLE_SCHEMA = 'schemaNameHere' AND TABLE_NAME = 'tableNameHere'

    Create a copy of that table:

    CREATE DATABASE debug_schema;
    CREATE TABLE debug_schema.partitions LIKE information_schema.partitions;
    INSERT INTO debug_schema.partitions SELECT * FROM information_schema.partitions;

    Adjust the copy to report a different value:

    UPDATE debug_schema.partitions SET data_length = POW(2, 32);

    Install the Rewriter plugin, and add a rule to modify the schema name:

    INSERT INTO query_rewrite.rewrite_rules(pattern, replacement) VALUES (
    'select * FROM information_schema.partitions WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?',
    'select * FROM debug_schema.partitions WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?'
    );

    CALL query_rewrite.flush_rewrite_rules();

    Now when I try the client, it thinks information_schema has reported a huge data_length, and I get the same error which my customer had reported. Success!

  • Aggregate JSON function in MySQL

    There is not yet an equivalent to GROUP_CONCAT that produces a JSON array. (There is in MySQL 8, but that’s not GA yet.) Until then, you can hack it together with string functions:

    SELECT * FROM t;
    +------+--------+
    | id   | data   |
    +------+--------+
    |    1 | First  |
    |    2 | Second |
    +------+--------+
    
    SELECT CONCAT('[', GROUP_CONCAT(JSON_OBJECT('id', id, 'value', data) SEPARATOR ', '), ']') AS j FROM t;
    +-------------------------------------------------------------+
    | j                                                           |
    +-------------------------------------------------------------+
    | [{"id": 1, "value": "First"}, {"id": 2, "value": "Second"}] |
    +-------------------------------------------------------------+
    

    Or you can use all JSON functions but hack the grouping:

    SELECT j FROM (
           SELECT
             @c := @c + 1 AS c,
             @j := JSON_MERGE(@j, JSON_OBJECT('id', id, 'value', data)) AS j
           FROM t
           JOIN (SELECT @c := 0,  @j := JSON_ARRAY()) dt1
         ) dt2 ORDER BY c DESC LIMIT 1;
    +-------------------------------------------------------------+
    | j                                                           |
    +-------------------------------------------------------------+
    | [{"id": 1, "value": "First"}, {"id": 2, "value": "Second"}] |
    +-------------------------------------------------------------+
    
  • I’m really quite good with maps

    Workbench announced support for a spatial view in 6.2, but examples are somewhat lacking. Just how do you get a SHP into MySQL?

    worldmap

    Download and unpack a SHP file such as these country boundaries.

    In the Workbench installation directory, you’ll find a program “ogr2ogr” that can convert .shp or GeoJSON and load it straight into MySQL.

    “C:\Program Files\MySQL\MySQL Workbench 8.0\ogr2ogr.exe” -f MySQL MySQL:databaseNameHere,user=userNameHere,password=passwordHere World_Countries_Generalized.shp -nln worldmap -lco GEOMETRY_NAME=p

    (It might give the error ERROR 1: PROJ: proj_identify: Cannot find proj.db which it seems can be ignored.)

    Now just select rows of interest in Workbench, click the Spatial View format button, and there’s your world map.

    You can run multiple selects (such as the citylot data from yesterday’s post) to overlay on top of the world map.

    worldmap_overlay

  • The world is not in your books and maps.

    MySQL 5.7 came out with support for JSON, improved geometry, and virtual columns. Here’s an example showing them all playing together.

    click to embiggen

    Download citylots.json.

    It comes as one big object, so we’ll break it up into separate lines:
    grep "^{ .type" citylots.json > properties.json

    Connect to a 5.7 instance of MySQL.

    CREATE TABLE citylots (id serial, j json, p geometry as (ST_GeomFromGeoJSON(j, 2)));
    LOAD DATA LOCAL INFILE 'properties.json' INTO TABLE citylots (j);

    A few of the rows don’t contain useful data:
    DELETE FROM citylots WHERE j->'$.geometry.type' IS NULL;

    In MySQL Workbench, do:
    SELECT id, p FROM citylots;

    Then click on Spatial View. It takes a couple of minutes for 200k rows, but there’s a map of San Francisco.

    The default projection, ‘Robinson’, is designed for showing the whole world at once and so is pretty distorted for this particular data set. Mercator or Equirectangular are better choices. Fortunately, Workbench repaints the data in just a few seconds.

    If you selected some other fields, you can click on the map and see the relevant data for that particular geometry.

  • Quarto

    mysql < quarto.sql

    Example game play:
    
    mysql> -- Start the game and pass the first piece in the lower nibble
    mysql> CALL SetupGame(0x0A);
    -----------------------------------------------------------------------+
    | rules                                                                |
    -----------------------------------------------------------------------+
    | Quarto: 4 in a line (row, column, or long diagonal) with at least one bit in common wins.
    CALL Play(move); -- high 4 bits are board position, low 4 bits are piece for next player
    CALL PrintBoard(base); -- to display the board. Useful bases are 16 and 2. |
    -----------------------------------------------------------------------+
    
    +------------------------+
    | instructions           |
    +------------------------+
    | Player 1, play piece A |
    +------------------------+
    
    mysql> CALL Play(0x00);
    +---------+
    | board   |
    +---------+
    | A| | |  |
    |  | | |  |
    |  | | |  |
    |  | | |  |
    +---------+
    
    +------------------------+
    | instructions           |
    +------------------------+
    | Player 0, play piece 0 |
    +------------------------+
    
    mysql> CALL Play(0x1C);
    +---------+
    | board   |
    +---------+
    | A|0| |  |
    |  | | |  |
    |  | | |  |
    |  | | |  |
    +---------+
    
    +------------------------+
    | instructions           |
    +------------------------+
    | Player 1, play piece C |
    +------------------------+
    
    mysql> CALL Play(0x2E);
    +---------+
    | board   |
    +---------+
    | A|0|C|  |
    |  | | |  |
    |  | | |  |
    |  | | |  |
    +---------+
    
    +------------------------+
    | instructions           |
    +------------------------+
    | Player 0, play piece E |
    +------------------------+
    
    mysql> CALL Play(0x3F);
    +---------+
    | board   |
    +---------+
    | A|0|C|E |
    |  | | |  |
    |  | | |  |
    |  | | |  |
    +---------+
    
    +---------------+
    | result        |
    +---------------+
    | player 0 wins |
    +---------------+