A Little Noise


Aggregate JSON arrays

Given: a table with JSON arrays

CREATE TABLE t (id int auto_increment primary key, d json);
INSERT INTO t VALUES (1, '["apple", "apple", "orange"]');
INSERT INTO t VALUES (2, '["apple", "banana", "orange", "orange", "orange"]');

The desired output is each row with a count of the unique objects:

| id   | fruitCount                             |
|    1 | {"apple": 2, "orange": 1}              |
|    2 | {"apple": 1, "banana": 1, "orange": 3} |

JSON_TABLE() can transform the array into rows.

SELECT id, fruit
    "$[*]" COLUMNS (
        fruit VARCHAR(100) PATH "$"
) AS dt;

| id | fruit  |
|  1 | apple  |
|  1 | apple  |
|  1 | orange |
|  2 | apple  |
|  2 | banana |
|  2 | orange |
|  2 | orange |
|  2 | orange |

(If you got an error there about either permissions or an unknown table/column in the JSON_TABLE, upgrade to at least 8.0.14 to get past some bugs.)

We can then use the ordinary aggregation functions:

    "$[*]" COLUMNS (
        fruit VARCHAR(100) PATH "$"
) AS dt
GROUP BY id, fruit;

| id | fruit  | COUNT(*) |
|  1 | apple  |        2 |
|  1 | orange |        1 |
|  2 | apple  |        1 |
|  2 | banana |        1 |
|  2 | orange |        3 |

With a subquery or a common table expression, we can aggregate individual rows at a time (meaning you can also select other columns from t). Then JSON_OBJECTAGG will combine those results into a single object.

        WITH cte AS (
            SELECT fruit, COUNT(*) AS c
            FROM JSON_TABLE(d, 
                "$[*]" COLUMNS (
                    fruit VARCHAR(100) PATH "$"
            ) AS dt
            GROUP BY fruit
        SELECT JSON_OBJECTAGG(fruit, c) FROM cte
    ) AS fruitCount

| id   | fruitCount                             |
|    1 | {"apple": 2, "orange": 1}              |
|    2 | {"apple": 1, "banana": 1, "orange": 3} |

et voilĂ 

Filed under: MySQL No Comments

Using map with class methods – why map(split) doesn’t work

I have some sentences.

text = [
  "Call me Ishmael.",
  "Some years ago, never mind how long precisely, having little or no money..."

How many words are in each sentence?

for sentence in text:
  sentenceLength = len(sentence.split())


But I want to do it all at once the functional programming way, with maps.

list(map(len, map(split, text)))

NameError: name 'split' is not defined

Why does that produce an error? Because "split" isn't a function. It's a method of strings: str.split(), not split(str).

So how do we use map with a class method?

from operator import methodcaller
split = methodcaller("split")

That means, "Create a function. I'll pass in an object. Call its 'split' method."

Now it works.

list(map(len, map(split, text)))
[3, 13]

Of course, there are other ways. Don't even need map.

[len(s.split()) for s in text]
Filed under: Python, Technical No Comments

Window Functions with Unusual Boundaries

Somebody on Freenode wanted this:

   Source            Result
+----+------+    +----+------+
| id | x    |    | id | c    |
+----+------+    +----+------+
|  1 |    1 |    |  1 |    2 |
|  2 |    1 |    |  2 |    2 |
|  3 | NULL |    |  3 | NULL |
|  4 | NULL | -> |  4 | NULL |
|  5 |    1 |    |  5 |    1 |
|  6 | NULL |    |  6 | NULL |
|  7 |    1 |    |  7 |    3 |
|  9 |    1 |    |  9 |    3 |
| 10 |    1 |    | 10 |    3 |
+----+------+    +----+------+

The result uses the NULL values in x as boundaries of windows, and counts the number of rows within each window. I don't know why anyone wants such a thing; it is not ours to reason why...

Anyway, the point is that you can use arbitrary expressions, even subqueries, to define your window partitions.

    -- Count of rows in windows bound by NULL values in x
      x IS NULL, 
        -- Partition by the number of earlier NULLs
        SELECT COUNT(*) FROM t AS t1 
        WHERE t1.id < t.id AND t1.x IS NULL
        -- Exclude the end boundary "NULL" from the window
        x IS NULL
        ORDER BY id
    ) AS c 

How does it work?

First, let's see what that subquery is all about:

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p 
| id | x    | p |
|  1 |    1 | 0 |
|  2 |    1 | 0 |
|  3 | NULL | 0 |
|  4 | NULL | 1 |
|  5 |    1 | 2 |
|  6 | NULL | 2 |
|  7 |    1 | 3 |
|  9 |    1 | 3 |
| 10 |    1 | 3 |

By counting the number of "NULL rows" appearing earlier in the table, we get a value we can use to find the starting point of each window.

That alone goes one row too far, though - the "NULL row" which should end each window gets included in the window. However, you can use multiple expressions to partition windows.

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p1, 
x IS NULL AS p2 
| id | x    | p1   | p2 |
|  1 |    1 |    0 |  0 |
|  2 |    1 |    0 |  0 |
|  3 | NULL |    0 |  1 |
|  4 | NULL |    1 |  1 |
|  5 |    1 |    2 |  0 |
|  6 | NULL |    2 |  1 |
|  7 |    1 |    3 |  0 |
|  9 |    1 |    3 |  0 |
| 10 |    1 |    3 |  0 |

The combination of (p1, p2) neatly partition the rows, so each "NULL row" is by itself, and non-NULL rows are together.

Filed under: MySQL No Comments


There's a GTID_SUBTRACT function, and the manual shows how to write your own cheap GTID_UNION, but not a GTID_INTERSECT. Fortunately it's easy to write your own, as it's just subtracting twice.


What use is it?

SET @slave_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681577,

SET @master_executed = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-730294,

SET @master_purged = '33738f8c-c1a5-11e7-8fc3-0a002700000f:1-681582,

SELECT @slave_needs := GTID_SUBTRACT(@master_executed, @slave_executed);
| 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-730294, |
| 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |

SELECT @slave_will_not_get := GTID_INTERSECT(@master_purged, @slave_needs);
| 33738f8c-c1a5-11e7-8fc3-0a002700000f:681578-681582, |
| 52b9a949-d79d-11e3-80dd-0050568d193e:1267099:1267417 |

While we're on the subject, the manual's GTID_UNION just concatenates the two sets, which means the output will list values twice. A small improvement will clean that up:


Filed under: MySQL No Comments

Change Switched on Schoolhouse client host

Our homeschool curriculum choice has been Switched On Schoolhouse, by Alpha Omega Publications. AOP offers the same curriculum in an online format called "Monarch"; Switched On Schoolhouse is their disc-based version. We chose SOS instead of Monarch because it allows our students to do their work anywhere - in the car or a doctor's waiting room - regardless of the available Internet connection.

SOS has two program interfaces, "student" and "teacher", and two installation modes, which they call "full" and "client". If you only have one computer, then you do a "full" install of both "student" and "teacher", and both the student's work and any grading are done all in the same place. However, that means that for me to do any grading, my students can't be using their computers, and I also have to do be seated at their computer rather than using my own, which isn't ideal for us.

AOP has attempted to resolve this inconvenience with their "client" offering. The intended layout is a "full" installation on some host computer (probably the teacher's machine), and then a "client" installation of the student interface on each student's machine. This setup allows the teacher to see all students in the school within a single interface, switching between them with a simple dropdown menu. However, it also requires that the host computer be available on a local network to the students at all times, which would be even worse for us than the online requirement of Monarch (we'd have to set up a local network in the car, and I'd have to bring my computer when I would not normally).

It's possible to reverse the installation: do a full install on the student's machine, and a client install on the teacher's. This layout allows the students to work from anywhere, and when we are on the same local network, I can access their machine from mine and do grading, even while they are using their computer for something else. For a single student, it's ideal, but there's one snag: we have multiple students, each with their own computer. During client installation, you must specify which machine is running the full installation, and there is nothing in the interface that allows you to easily change it later. So I can set it up to connect to Iona's computer, but then can't switch to grade Eden's work.

However, all is not lost. The client stores the name of the computer to which it connects in a configuration file. Edit the file, change the name of the computer, restart, and you're now connected to the other student.

The configuration file is C:\Program Files (x86)\AOP\Switched-On Schoolhouse 2016 Home\SOSHApp.exe.config

(throughout this text, anywhere it says "2016", adjust for the current year.)

As plain text, it can be edited with Notepad, Wordpad, vim, or any similar program. I usually avoid using Microsoft Word or similar word processors to edit config files, because they often try to modify the encoding, replace straight quotes with "pretty" quotes, and other similar things which make the program which is trying to read that file very unhappy. It's a protected file, which means the editor must be running with elevated privileges (right click, "Run as administrator").

About halfway down the file is a line that looks like this:

<add key="ConnectionString" value="Initial Catalog=SOSHOME2016;Data Source=IONA-LAPTOP\SOSHOME80;" />

Change IONA-LAPTOP to EDEN-LAPTOP, save the file, start up SOS, and now I can grade Eden's work instead.

Editing a config file manually every time I want to switch students is tiresome, boring, and repetitive - exactly the kind of things computers do way better than humans. A script beckons.

Download SOS-Switcher.zip (you might be warned by your browser or anti-virus that this is dangerous program. They're right; it has the potential to be dangerous. You should examine the contents and know what they do before blindly trusting them with your computer.)

Since Windows doesn't haveat that time I didn't yet know how to use Powershell to get a nice regular expression replace program like sed built in, we'll need our own - that's what replace.vbs does. Then we'll need a batch file to call it with the correct parameters - that's what switch.bat does.

Extract these two files and put them in the C:\Program Files (x86)\AOP\Switched-On Schoolhouse 2016 Home\ directory (which you'll have to do as administrator).

Now, for each student, right-click on switch.bat and choose "Create Shortcut". It will ask if it can create the shortcut on the desktop; that's fine.

Right click, Create Shortcut

Right-click on the newly created shortcut, and choose Properties. On the Shortcut tab, there's a Target box, which should say something like:

"C:\Program Files (x86)\AOP\Switched-On Schoolhouse 2016 Home\switch.bat"

Add the name of the student's computer to that, outside the quotes:

"C:\Program Files (x86)\AOP\Switched-On Schoolhouse 2016 Home\switch.bat" IONA-LAPTOP

Click "Advanced" and check "Run as administrator".

You may also choose "Change Icon" - it will tell you that switch.bat doesn't contain any icons, and bring up a window with a bunch of standard Windows icons to choose from. To get the SOS icon, click on Browse..., navigate to C:\Program Files (x86)\AOP\Switched-On Schoolhouse 2016 Home, and open SOSHApp.exe.

On the General tab, you can change the shortcut name. It's probably "switch.bat - Shortcut"; I changed it to "SOS Iona".

Repeat the process for the other students - create a shortcut, modify properties, change the target:

"C:\Program Files (x86)\AOP\Switched-On Schoolhouse 2016 Home\switch.bat" EDEN-LAPTOP

Now, make sure that SOS is not running. Double click on one of the shortcuts. After a moment (and possibly a question if you want to let it change settings on your computer - choose Yes or OK), it should open SOS, and you'll find that you have access to one student's work. Close SOS, launch the other shortcut, and verify that you now have access to the other student's work.

If there's a very long pause, and then an error message saying it can't connect, either you did not put the correct computer name in the shortcut target, or that computer is not running or has a firewall preventing access. The SOS installation disc 2 has some utilities to help you configure all that, and their FAQ and tech support can walk you through that bit.

Filed under: Kids 2 Comments

ERROR 3037 (22023): Invalid GIS data provided to function st_geometryfromtext.

1. Watch the parentheses. It's:

ST_GeomFromText('POLYGON((outerRing), (innerRing), (innerRing), ...)')

The inner rings are optional. If you have just the outer ring, then it's still:


and not:


2. Polygons have to start and end at the same point.

3. Watch the commas. Rings are comma-delimited sets of of whitespace-delimited coordinate pairs:

x1 y1, x2 y2, x3 y3, x4 y4

x1 y1 x2 y2 x3 y3
x1, y1, x2, y2, x3, y3, x4, y4
(x1, y1), (x2, y2), (x3, y3)
or other variations on that theme.

Filed under: MySQL No Comments

On slave_parallel_workers and the logical clock

How can you tell if a given workload on the master could be replicated with many parallel workers on the slave?

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock. The implementation is described in WL #7165 - including a neat little ASCII-art graphic.

Each event in the binary log is stamped with two values:
- The sequence_number increments for each commit
- The last_committed is the sequence_number which was in effect when this transaction entered the prepare phase.

A slave worker cannot begin doing a transaction until the last_committed value is smaller than the sequence_number of all other running threads.

mysqlbinlog mysql-bin.0000x | grep last_committed

#160118 15:31:34 server id 3  end_log_pos 1527 CRC32 0xcdf6bd8d         GTID    last_committed=0        sequence_number=1
#160118 15:31:34 server id 3  end_log_pos 2627 CRC32 0x72e5fa80         GTID    last_committed=1        sequence_number=2
#160119  9:59:17 server id 3  end_log_pos 2952 CRC32 0x7290d02f         GTID    last_committed=2        sequence_number=3
#160119  9:59:17 server id 3  end_log_pos 3248 CRC32 0x227fe513         GTID    last_committed=3        sequence_number=4
#160119  9:59:17 server id 3  end_log_pos 3544 CRC32 0x8a6a2b78         GTID    last_committed=4        sequence_number=5
#160119  9:59:17 server id 3  end_log_pos 5315 CRC32 0x35bca421         GTID    last_committed=4        sequence_number=6
#160119  9:59:17 server id 3  end_log_pos 7094 CRC32 0x75584a42         GTID    last_committed=4        sequence_number=7
#160119  9:59:17 server id 3  end_log_pos 8865 CRC32 0xe4ee1b54         GTID    last_committed=6        sequence_number=8
#160119  9:59:17 server id 3  end_log_pos 10636 CRC32 0xed1f7c48        GTID    last_committed=6        sequence_number=9
#160119  9:59:17 server id 3  end_log_pos 12409 CRC32 0x7802f625        GTID    last_committed=9        sequence_number=10

In the first transaction, last_committed=0, sequence_number=1
The first worker thread takes that transaction and goes to work.

In the second transaction, last_committed=1, sequence_number=2
This transaction cannot start until the first one finishes, because last_committed=1 is not smaller than the previous sequence_number=1. So the first two transactions must be done in series.

In the third transaction, last_committed=2, sequence_number=3. Again, no worker could start this transaction until that previous trx with sequence_number=2 finishes.

Although these first four transactions might actually be assigned to different workers, effectively they execute in series, just like single-threaded replication would work.

Now, we come to the transactions with last_committed=4. As soon as that transaction with sequence_number=4 finishes, these next three transactions can go in parallel.

last_committed=4        sequence_number=5
last_committed=4        sequence_number=6
last_committed=4        sequence_number=7

Once the first two finish, two more can start:

last_committed=6        sequence_number=8
last_committed=6        sequence_number=9

They can run even while the one with sequence_number=7 is still running.

But this one has to wait for everything before it:

last_committed=9        sequence_number=10

We can see that for these 10 transactions, only three workers could ever be operating at the same time.

You might have spotted a pattern: if the difference (sequence_number - last_committed) = 1, then the transaction has to wait for everything before it to finish. If the difference is 2, then the transactions can run in parallel with just the previous transaction, and so on.

You can count how much of your transaction history falls into each category:

mysqlbinlog mysql-bin.0000x | grep -o 'last_committed.*' | sed 's/=\|\s/ /g' | awk '{print $4-$2}' | sort -g | uniq -c

 20953 1
  8998 2
  6021 3
  4209 4
  3083 5
  2180 6
  1403 7
   820 8
   418 9
   176 10
     7 11
     2 12
     1 13
     1 15
     1 16

In that binary log, 20953 transactions had to wait for all previous transactions to commit before starting. 8998 could run in parallel with one previous transaction. Only a very few could run in parallel with more than 10 previous transactions. Jut 3 worker threads would cover 75% of the possible parallel transactions. 8 worker threads would cover 99%.

See also Percona's blog for discussion on how to view similar information via the performance_schema on a running slave.

Filed under: MySQL 1 Comment



BIT values cannot be loaded using binary notation (for example, b'011010').

$ cat test.txt



CREATE TABLE loadTest (b BIT(6));


| Level   | Code | Message                               |
| Warning | 1406 | Data too long for column 'b' at row 1 |
| Warning | 1406 | Data too long for column 'b' at row 2 |

-- Note the wrong values:

SELECT BIN(b) FROM loadTest;
| BIN(b) |
| 111111 |
| 111111 |

TRUNCATE loadTest;

LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest (@b) 
  SET b = CAST(CONV(TRIM('''' FROM SUBSTRING_INDEX(@b, 'b', -1)), 2, 10) AS UNSIGNED);

SELECT BIN(b) FROM loadTest;
| BIN(b) |
| 101010 |
| 111000 |


SUBSTRING_INDEX(@b, 'b', -1)

removes the leading 'b' or '0b'.

TRIM('''' FROM ...)

removes the quotes if there are any.

CONV(..., 2, 10)

converts the string of 0s and 1s into a string of decimal digits.


turns the string of decimal digits into an integer.
MySQL automatically casts integers into bits when inserting into a BIT type.

Filed under: MySQL, One liners No Comments

Quiz: Drop non-global users

Somebody asked on Freenode. I don't know why they wanted it. How would you drop all MySQL users who do not have "GRANT ALL ON *.* ... WITH GRANT OPTION"? That is, drop any users who have 'N' in any of the privilege columns in `mysql`.`user`.

My solution shown below. Did you think of a different approach?

My solution ▼

Filed under: MySQL, One liners No Comments

Square Root of a Complex Number using Compass and Straight Edge Geometry

Also available as an interactive applet (scrollwheel or pinch to zoom so you can adjust the axis how you want).

Descartes allowed a line segment of unit length to find roots. The rest of the construction is completed with just Euclid's compass and straight edge.

Given: a complex number C plotted on the complex plane.
Bisect the angle formed between C, the origin O, and the positive X (real) axis.
Extend OC past O by a unit line segment to U.
Find the midpoint, M, between U and C.
Construct a half-circle centered at M through U and C.
Draw the perpendicular to UC through O to intersect the half-circle at I.
Draw a circle centered at O through I.
The circle intersects the bisector at S.
S is a square root of C.

Filed under: Technical No Comments