A Little Noise

May 6, 2020

JSON in MySQL: Keys which do NOT match a value (the more elegant way)

Filed under: MySQL — snoyes @ 12:33 pm

A second approach to finding the non-0 values from a JSON object is to turn the keys and values into separate columns of a table. This is cleaner, but a little wordier.

We’ll get the keys in one table:

SELECT ord, keyname 
FROM JSON_TABLE(JSON_KEYS(@j), '$[*]' COLUMNS (
    ord FOR ORDINALITY,
    keyname VARCHAR(100) PATH '$')
) AS keyTable;
 +------+----------+
 | ord  | keyname  |
 +------+----------+
 |    1 | RETRO    |
 |    2 | BOOSTER  |
 |    3 | SURGEON  |
 |    4 | GUIDANCE |
 |    5 | RECOVERY |
 +------+----------+

And the values in a second table:

SELECT ord, keyvalue
FROM JSON_TABLE(@j, '$.*' COLUMNS (
    ord FOR ORDINALITY,
    keyvalue INT PATH '$')
) AS valueTable;
+------+----------+
| ord  | keyvalue |
+------+----------+
|    1 | 2        |
|    2 | 1        |
|    3 | 1        |
|    4 | 0        |
|    5 | 0        |
+------+----------+

And then join those two together:

SELECT keyname, keyvalue
 FROM JSON_TABLE(JSON_KEYS(@j), '$[*]' COLUMNS (
    ord FOR ORDINALITY,
    keyname VARCHAR(100) PATH '$')) AS keyTable
 JOIN JSON_TABLE(@j, '$.*' COLUMNS (
    ord FOR ORDINALITY,
    keyvalue INT PATH '$')
) AS valueTable USING (ord);
 +----------+----------+
 | keyname  | keyvalue |
 +----------+----------+
 | RETRO    | 2        |
 | BOOSTER  | 1        |
 | SURGEON  | 1        |
 | GUIDANCE | 0        |
 | RECOVERY | 0        |
 +----------+----------+

Now you can use ordinary operations like a WHERE clause to do your filtering, and then JSON_ARRAYAGG() to put them all back into JSON at the end.

SELECT JSON_ARRAYAGG(keyname) AS j
 FROM JSON_TABLE(JSON_KEYS(@j), '$[*]' COLUMNS (
    ord FOR ORDINALITY,
    keyname VARCHAR(100) PATH '$')) AS keyTable
 JOIN JSON_TABLE(@j, '$.*' COLUMNS (
    ord FOR ORDINALITY,
    keyvalue INT PATH '$')
) AS valueTable USING (ord)
 WHERE keyvalue != 0;
 +---------------------------------+
 | j                               |
 +---------------------------------+
 | ["RETRO", "BOOSTER", "SURGEON"] |
 +---------------------------------+

May 5, 2020

JSON in MySQL: Keys which do NOT match a value

Filed under: MySQL — snoyes @ 12:18 pm

Given a JSON object:
SET @j := '{"BOOSTER": 1, "RETRO": 2, "GUIDANCE": 0, "SURGEON": 1, "RECOVERY": 0}';

How do you find the keys which do NOT contain a value of 0?

One approach is to find the keys which do match 0, and then remove them.

Sounds like JSON_SEARCH and JSON_REMOVE? But there’s a gotcha: JSON_SEARCH works only with strings. Ok, so we REPLACE(@j, ‘0’, ‘”0″‘) – but that doesn’t help, because JSON_REMOVE can’t accept an array of paths like JSON_SEARCH would return; it requires each path as a separate parameter.

Instead, JSON_MERGE_PATCH will suffice. It has the feature that values from the second parameter are preserved only if they are not NULL. So, the approach becomes turning the 0s into NULLs, and then merging that document into an empty one.

SELECT REPLACE(@j, 0, 'null') AS j; -- case matters! null, not NULL
+------------------------------------------------------------------------------+
| j                                                                            |
+------------------------------------------------------------------------------+
| {"BOOSTER": 1, "RETRO": 2, "GUIDANCE": null, "SURGEON": 1, "RECOVERY": null} |
+------------------------------------------------------------------------------+

SELECT JSON_MERGE_PATCH('{}', REPLACE(@j, 0, 'null')) AS j;
+------------------------------------------+
| j                                        |
+------------------------------------------+
| {"RETRO": 2, "BOOSTER": 1, "SURGEON": 1} |
+------------------------------------------+

And since I technically asked for just the keys, not the key/value pairs,

SELECT JSON_KEYS(JSON_MERGE_PATCH('{}', REPLACE(@j, 0, 'null'))) AS j;
+---------------------------------+
| j                               |
+---------------------------------+
| ["RETRO", "BOOSTER", "SURGEON"] |
+---------------------------------+

But, there’s a cleaner way.

January 9, 2020

Aggregate JSON arrays

Filed under: MySQL — snoyes @ 11:03 am

Given: a table with JSON arrays

CREATE TABLE t (id int auto_increment primary key, d json);
INSERT INTO t VALUES (1, '["apple", "apple", "orange"]');
INSERT INTO t VALUES (2, '["apple", "banana", "orange", "orange", "orange"]');

The desired output is each row with a count of the unique objects:

+------+----------------------------------------+
| id   | fruitCount                             |
+------+----------------------------------------+
|    1 | {"apple": 2, "orange": 1}              |
|    2 | {"apple": 1, "banana": 1, "orange": 3} |
+------+----------------------------------------+

JSON_TABLE() can transform the array into rows.

SELECT id, fruit
FROM t,
JSON_TABLE(d,
    "$[*]" COLUMNS (
        fruit VARCHAR(100) PATH "$"
    )
) AS dt;

+----+--------+
| id | fruit  |
+----+--------+
|  1 | apple  |
|  1 | apple  |
|  1 | orange |
|  2 | apple  |
|  2 | banana |
|  2 | orange |
|  2 | orange |
|  2 | orange |
+----+--------+

(If you got an error there about either permissions or an unknown table/column in the JSON_TABLE, upgrade to at least 8.0.14 to get past some bugs.)

We can then use the ordinary aggregation functions:

SELECT
    id,
    fruit,
    COUNT(*)
FROM t,
JSON_TABLE(d,
    "$[*]" COLUMNS (
        fruit VARCHAR(100) PATH "$"
    )
) AS dt
GROUP BY id, fruit;

+----+--------+----------+
| id | fruit  | COUNT(*) |
+----+--------+----------+
|  1 | apple  |        2 |
|  1 | orange |        1 |
|  2 | apple  |        1 |
|  2 | banana |        1 |
|  2 | orange |        3 |
+----+--------+----------+

With a subquery or a common table expression, we can aggregate individual rows at a time (meaning you can also select other columns from t). Then JSON_OBJECTAGG will combine those results into a single object.

SELECT
    id,
    (
        WITH cte AS (
            SELECT fruit, COUNT(*) AS c
            FROM JSON_TABLE(d, 
                "$[*]" COLUMNS (
                    fruit VARCHAR(100) PATH "$"
                )
            ) AS dt
            GROUP BY fruit
        )
        SELECT JSON_OBJECTAGG(fruit, c) FROM cte
    ) AS fruitCount
FROM t;

+------+----------------------------------------+
| id   | fruitCount                             |
+------+----------------------------------------+
|    1 | {"apple": 2, "orange": 1}              |
|    2 | {"apple": 1, "banana": 1, "orange": 3} |
+------+----------------------------------------+

et voilĂ 

November 8, 2018

Window Functions with Unusual Boundaries

Filed under: MySQL — snoyes @ 12:44 pm

Somebody on Freenode wanted this:

   Source            Result
+----+------+    +----+------+
| id | x    |    | id | c    |
+----+------+    +----+------+
|  1 |    1 |    |  1 |    2 |
|  2 |    1 |    |  2 |    2 |
|  3 | NULL |    |  3 | NULL |
|  4 | NULL | -> |  4 | NULL |
|  5 |    1 |    |  5 |    1 |
|  6 | NULL |    |  6 | NULL |
|  7 |    1 |    |  7 |    3 |
|  9 |    1 |    |  9 |    3 |
| 10 |    1 |    | 10 |    3 |
+----+------+    +----+------+

The result uses the NULL values in x as boundaries of windows, and counts the number of rows within each window. I don’t know why anyone wants such a thing; it is not ours to reason why…

Anyway, the point is that you can use arbitrary expressions, even subqueries, to define your window partitions.

SELECT 
    id, 
    -- Count of rows in windows bound by NULL values in x
    IF(
      x IS NULL, 
      NULL, 
      COUNT(*) OVER (PARTITION BY (
        -- Partition by the number of earlier NULLs
        SELECT COUNT(*) FROM t AS t1 
        WHERE t1.id < t.id AND t1.x IS NULL
        ), 
        -- Exclude the end boundary "NULL" from the window
        x IS NULL
        ORDER BY id
      )
    ) AS c 
FROM t;

How does it work?

First, let's see what that subquery is all about:

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p 
FROM t;
+----+------+---+
| id | x    | p |
+----+------+---+
|  1 |    1 | 0 |
|  2 |    1 | 0 |
|  3 | NULL | 0 |
|  4 | NULL | 1 |
|  5 |    1 | 2 |
|  6 | NULL | 2 |
|  7 |    1 | 3 |
|  9 |    1 | 3 |
| 10 |    1 | 3 |
+----+------+---+

By counting the number of "NULL rows" appearing earlier in the table, we get a value we can use to find the starting point of each window.

That alone goes one row too far, though - the "NULL row" which should end each window gets included in the window. However, you can use multiple expressions to partition windows.

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p1, 
x IS NULL AS p2 
FROM t;
+----+------+------+----+
| id | x    | p1   | p2 |
+----+------+------+----+
|  1 |    1 |    0 |  0 |
|  2 |    1 |    0 |  0 |
|  3 | NULL |    0 |  1 |
|  4 | NULL |    1 |  1 |
|  5 |    1 |    2 |  0 |
|  6 | NULL |    2 |  1 |
|  7 |    1 |    3 |  0 |
|  9 |    1 |    3 |  0 |
| 10 |    1 |    3 |  0 |
+----+------+------+----+

The combination of (p1, p2) neatly partition the rows, so each "NULL row" is by itself, and non-NULL rows are together.

November 4, 2017

GTID_INTERSECT

Filed under: MySQL — snoyes @ 4:33 pm

There’s a GTID_SUBTRACT function, and the manual shows how to write your own cheap GTID_UNION, but not a GTID_INTERSECT. Fortunately it’s easy to write your own, as it’s just subtracting twice.

CREATE FUNCTION GTID_INTERSECT(g1 TEXT, g2 TEXT)
RETURNS TEXT DETERMINISTIC
RETURN GTID_SUBTRACT(g1, GTID_SUBTRACT(g1, g2));

What use is it?

SET @slave_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681577,
421d139e-04b9-11e7-b702-0050569935dc:1-13764443,
52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1267098:1267100-1267416:1267418-1589733';

SET @master_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-730294,
421d139e-04b9-11e7-b702-0050569935dc:1-13764443,
52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1589733';

SET @master_purged = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681582,
421d139e-04b9-11e7-b702-0050569935dc:1-13077260,
52b9a949-d79d-11e3-80dd-0050568d193e:1-1207378:1207380-1261803:1261805-1589733';

SELECT @slave_needs := GTID_SUBTRACT(@master_executed, @slave_executed);
/*-----------------------------------------------------+
| 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-730294, |
| 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |
+-----------------------------------------------------*/

SELECT @slave_will_not_get := GTID_INTERSECT(@master_purged, @slave_needs);
/*-----------------------------------------------------+
| 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-681582, |
| 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |
+-----------------------------------------------------*/

While we’re on the subject, the manual’s GTID_UNION just concatenates the two sets, which means the output will list values twice. A small improvement will clean that up:

CREATE FUNCTION GTID_UNION(g1 TEXT, g2 TEXT)
RETURNS TEXT DETERMINISTIC
RETURN GTID_SUBTRACT(CONCAT(g1,',',g2), '');

July 18, 2017

ERROR 3037 (22023): Invalid GIS data provided to function st_geometryfromtext.

Filed under: MySQL — snoyes @ 1:42 pm

1. Watch the parentheses. It’s:

ST_GeomFromText('POLYGON((outerRing), (innerRing), (innerRing), ...)')

The inner rings are optional. If you have just the outer ring, then it’s still:

ST_GeomFromText('POLYGON((outerRing))')

and not:

ST_GeomFromText('POLYGON(outerRing)')

2. Polygons have to start and end at the same point.

3. Watch the commas. Rings are comma-delimited sets of of whitespace-delimited coordinate pairs:

x1 y1, x2 y2, x3 y3, x4 y4

not:
x1 y1 x2 y2 x3 y3
x1, y1, x2, y2, x3, y3, x4, y4
(x1, y1), (x2, y2), (x3, y3)
or other variations on that theme.

July 11, 2017

On slave_parallel_workers and the logical clock

Filed under: MySQL — snoyes @ 10:36 am

How can you tell if a given workload on the master could be replicated with many parallel workers on the slave?

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock. The implementation is described in WL #7165 – including a neat little ASCII-art graphic.

Each event in the binary log is stamped with two values:
– The sequence_number increments for each commit
– The last_committed is the sequence_number which was in effect when this transaction entered the prepare phase.

A slave worker cannot begin doing a transaction until the last_committed value is smaller than the sequence_number of all other running threads.

mysqlbinlog mysql-bin.0000x | grep last_committed

#160118 15:31:34 server id 3  end_log_pos 1527 CRC32 0xcdf6bd8d         GTID    last_committed=0        sequence_number=1
#160118 15:31:34 server id 3  end_log_pos 2627 CRC32 0x72e5fa80         GTID    last_committed=1        sequence_number=2
#160119  9:59:17 server id 3  end_log_pos 2952 CRC32 0x7290d02f         GTID    last_committed=2        sequence_number=3
#160119  9:59:17 server id 3  end_log_pos 3248 CRC32 0x227fe513         GTID    last_committed=3        sequence_number=4
#160119  9:59:17 server id 3  end_log_pos 3544 CRC32 0x8a6a2b78         GTID    last_committed=4        sequence_number=5
#160119  9:59:17 server id 3  end_log_pos 5315 CRC32 0x35bca421         GTID    last_committed=4        sequence_number=6
#160119  9:59:17 server id 3  end_log_pos 7094 CRC32 0x75584a42         GTID    last_committed=4        sequence_number=7
#160119  9:59:17 server id 3  end_log_pos 8865 CRC32 0xe4ee1b54         GTID    last_committed=6        sequence_number=8
#160119  9:59:17 server id 3  end_log_pos 10636 CRC32 0xed1f7c48        GTID    last_committed=6        sequence_number=9
#160119  9:59:17 server id 3  end_log_pos 12409 CRC32 0x7802f625        GTID    last_committed=9        sequence_number=10

In the first transaction, last_committed=0, sequence_number=1
The first worker thread takes that transaction and goes to work.

In the second transaction, last_committed=1, sequence_number=2
This transaction cannot start until the first one finishes, because last_committed=1 is not smaller than the previous sequence_number=1. So the first two transactions must be done in series.

In the third transaction, last_committed=2, sequence_number=3. Again, no worker could start this transaction until that previous trx with sequence_number=2 finishes.

Although these first four transactions might actually be assigned to different workers, effectively they execute in series, just like single-threaded replication would work.

Now, we come to the transactions with last_committed=4. As soon as that transaction with sequence_number=4 finishes, these next three transactions can go in parallel.

last_committed=4        sequence_number=5
last_committed=4        sequence_number=6
last_committed=4        sequence_number=7

Once the first two finish, two more can start:

last_committed=6        sequence_number=8
last_committed=6        sequence_number=9

They can run even while the one with sequence_number=7 is still running.

But this one has to wait for everything before it:

last_committed=9        sequence_number=10

We can see that for these 10 transactions, only three workers could ever be operating at the same time.

You might have spotted a pattern: if the difference (sequence_number – last_committed) = 1, then the transaction has to wait for everything before it to finish. If the difference is 2, then the transactions can run in parallel with just the previous transaction, and so on.

You can count how much of your transaction history falls into each category:

mysqlbinlog mysql-bin.0000x | grep -o 'last_committed.*' | sed 's/=\|\s/ /g' | awk '{print $4-$2}' | sort -g | uniq -c

 20953 1
  8998 2
  6021 3
  4209 4
  3083 5
  2180 6
  1403 7
   820 8
   418 9
   176 10
     7 11
     2 12
     1 13
     1 15
     1 16

In that binary log, 20953 transactions had to wait for all previous transactions to commit before starting. 8998 could run in parallel with one previous transaction. Only a very few could run in parallel with more than 10 previous transactions. Jut 3 worker threads would cover 75% of the possible parallel transactions. 8 worker threads would cover 99%.

See also Percona’s blog for discussion on how to view similar information via the performance_schema on a running slave.

March 22, 2017

LOAD DATA INFILE into a BIT field

Filed under: MySQL,One liners — snoyes @ 11:35 am

https://dev.mysql.com/doc/en/load-data.html

BIT values cannot be loaded using binary notation (for example, b’011010′).

$ cat test.txt

b'101010'
0b111000

-----

CREATE TABLE loadTest (b BIT(6));

LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest;

/*--------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1406 | Data too long for column 'b' at row 1 |
| Warning | 1406 | Data too long for column 'b' at row 2 |
+---------+------+--------------------------------------*/

-- Note the wrong values:

SELECT BIN(b) FROM loadTest;
/*-------+
| BIN(b) |
+--------+
| 111111 |
| 111111 |
+-------*/

TRUNCATE loadTest;

LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest (@b) 
  SET b = CAST(CONV(TRIM('''' FROM SUBSTRING_INDEX(@b, 'b', -1)), 2, 10) AS UNSIGNED);

SELECT BIN(b) FROM loadTest;
/*-------+
| BIN(b) |
+--------+
| 101010 |
| 111000 |
+-------*/

Deconstruction:

SUBSTRING_INDEX(@b, 'b', -1)

removes the leading ‘b’ or ‘0b’.

TRIM('''' FROM ...)

removes the quotes if there are any.

CONV(..., 2, 10)

converts the string of 0s and 1s into a string of decimal digits.

CAST(... AS UNSIGNED)

turns the string of decimal digits into an integer.
MySQL automatically casts integers into bits when inserting into a BIT type.

March 16, 2017

Quiz: Drop non-global users

Filed under: MySQL,One liners — snoyes @ 10:18 am

Somebody asked on Freenode. I don’t know why they wanted it. How would you drop all MySQL users who do not have “GRANT ALL ON *.* … WITH GRANT OPTION”? That is, drop any users who have ‘N’ in any of the privilege columns in `mysql`.`user`.

My solution shown below. Did you think of a different approach?

My solution ▼

September 20, 2016

Debugging Large Data with Rewriter

Filed under: MySQL — snoyes @ 7:43 pm

A customer showed that a particular client reported a less-than-helpful error message when it tried to display some meta-data about a table.

A less-than-helpful error message that hints at an int.

I couldn’t repeat the behavior with just a copy of the schema, so I suspected it was because of the size of data in the customer’s server – somebody had used an int where they needed a long.

The customer’s data was quite large – many hundreds of GB – more than I could easily whip up on my laptop to test. But, I didn’t really need all that data, or even any data at all; I just needed MySQL to pretend it had all that data. Specifically, I needed information_schema to report a large data_length.

Enter Rewriter, the query rewrite plugin that ships with MySQL 5.7 and later.

First, the general query log gave the exact query sent by the client:

select * FROM information_schema.partitions WHERE TABLE_SCHEMA = 'schemaNameHere' AND TABLE_NAME = 'tableNameHere'

Create a copy of that table:

CREATE DATABASE debug_schema;
CREATE TABLE debug_schema.partitions LIKE information_schema.partitions;
INSERT INTO debug_schema.partitions SELECT * FROM information_schema.partitions;

Adjust the copy to report a different value:

UPDATE debug_schema.partitions SET data_length = POW(2, 32);

Install the Rewriter plugin, and add a rule to modify the schema name:

INSERT INTO query_rewrite.rewrite_rules(pattern, replacement) VALUES (
'select * FROM information_schema.partitions WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?',
'select * FROM debug_schema.partitions WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?'
);

CALL query_rewrite.flush_rewrite_rules();

Now when I try the client, it thinks information_schema has reported a huge data_length, and I get the same error which my customer had reported. Success!

Older Posts »

Powered by WordPress