A Little Noise

18Jul/170

ERROR 3037 (22023): Invalid GIS data provided to function st_geometryfromtext.

1. Watch the parentheses. It's:

ST_GeomFromText('POLYGON((outerRing), (innerRing), (innerRing), ...)')

The inner rings are optional. If you have just the outer ring, then it's still:

ST_GeomFromText('POLYGON((outerRing))')

and not:

ST_GeomFromText('POLYGON(outerRing)')

2. Polygons have to start and end at the same point.

3. Watch the commas. Rings are comma-delimited sets of of whitespace-delimited coordinate pairs:

x1 y1, x2 y2, x3 y3, x4 y4

not:
x1 y1 x2 y2 x3 y3
x1, y1, x2, y2, x3, y3, x4, y4
(x1, y1), (x2, y2), (x3, y3)
or other variations on that theme.

Filed under: MySQL No Comments
11Jul/171

On slave_parallel_workers and the logical clock

How can you tell if a given workload on the master could be replicated with many parallel workers on the slave?

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock. The implementation is described in WL #7165 - including a neat little ASCII-art graphic.

Each event in the binary log is stamped with two values:
- The sequence_number increments for each commit
- The last_committed is the sequence_number which was in effect when this transaction entered the prepare phase.

A slave worker cannot begin doing a transaction until the last_committed value is smaller than the sequence_number of all other running threads.

mysqlbinlog mysql-bin.0000x | grep last_committed

#160118 15:31:34 server id 3  end_log_pos 1527 CRC32 0xcdf6bd8d         GTID    last_committed=0        sequence_number=1
#160118 15:31:34 server id 3  end_log_pos 2627 CRC32 0x72e5fa80         GTID    last_committed=1        sequence_number=2
#160119  9:59:17 server id 3  end_log_pos 2952 CRC32 0x7290d02f         GTID    last_committed=2        sequence_number=3
#160119  9:59:17 server id 3  end_log_pos 3248 CRC32 0x227fe513         GTID    last_committed=3        sequence_number=4
#160119  9:59:17 server id 3  end_log_pos 3544 CRC32 0x8a6a2b78         GTID    last_committed=4        sequence_number=5
#160119  9:59:17 server id 3  end_log_pos 5315 CRC32 0x35bca421         GTID    last_committed=4        sequence_number=6
#160119  9:59:17 server id 3  end_log_pos 7094 CRC32 0x75584a42         GTID    last_committed=4        sequence_number=7
#160119  9:59:17 server id 3  end_log_pos 8865 CRC32 0xe4ee1b54         GTID    last_committed=6        sequence_number=8
#160119  9:59:17 server id 3  end_log_pos 10636 CRC32 0xed1f7c48        GTID    last_committed=6        sequence_number=9
#160119  9:59:17 server id 3  end_log_pos 12409 CRC32 0x7802f625        GTID    last_committed=9        sequence_number=10

In the first transaction, last_committed=0, sequence_number=1
The first worker thread takes that transaction and goes to work.

In the second transaction, last_committed=1, sequence_number=2
This transaction cannot start until the first one finishes, because last_committed=1 is not smaller than the previous sequence_number=1. So the first two transactions must be done in series.

In the third transaction, last_committed=2, sequence_number=3. Again, no worker could start this transaction until that previous trx with sequence_number=2 finishes.

Although these first four transactions might actually be assigned to different workers, effectively they execute in series, just like single-threaded replication would work.

Now, we come to the transactions with last_committed=4. As soon as that transaction with sequence_number=4 finishes, these next three transactions can go in parallel.

last_committed=4        sequence_number=5
last_committed=4        sequence_number=6
last_committed=4        sequence_number=7

Once the first two finish, two more can start:

last_committed=6        sequence_number=8
last_committed=6        sequence_number=9

They can run even while the one with sequence_number=7 is still running.

But this one has to wait for everything before it:

last_committed=9        sequence_number=10

We can see that for these 10 transactions, only three workers could ever be operating at the same time.

You might have spotted a pattern: if the difference (sequence_number - last_committed) = 1, then the transaction has to wait for everything before it to finish. If the difference is 2, then the transactions can run in parallel with just the previous transaction, and so on.

You can count how much of your transaction history falls into each category:

mysqlbinlog mysql-bin.0000x | grep -o 'last_committed.*' | sed 's/=\|\s/ /g' | awk '{print $4-$2}' | sort -g | uniq -c

 20953 1
  8998 2
  6021 3
  4209 4
  3083 5
  2180 6
  1403 7
   820 8
   418 9
   176 10
     7 11
     2 12
     1 13
     1 15
     1 16

In that binary log, 20953 transactions had to wait for all previous transactions to commit before starting. 8998 could run in parallel with one previous transaction. Only a very few could run in parallel with more than 10 previous transactions. Jut 3 worker threads would cover 75% of the possible parallel transactions. 8 worker threads would cover 99%.

See also Percona's blog for discussion on how to view similar information via the performance_schema on a running slave.

Filed under: MySQL 1 Comment
22Mar/170

LOAD DATA INFILE into a BIT field

https://dev.mysql.com/doc/en/load-data.html

BIT values cannot be loaded using binary notation (for example, b'011010').

$ cat test.txt

b'101010'
0b111000

-----

CREATE TABLE loadTest (b BIT(6));

LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest;

/*--------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1406 | Data too long for column 'b' at row 1 |
| Warning | 1406 | Data too long for column 'b' at row 2 |
+---------+------+--------------------------------------*/

-- Note the wrong values:

SELECT BIN(b) FROM loadTest;
/*-------+
| BIN(b) |
+--------+
| 111111 |
| 111111 |
+-------*/

TRUNCATE loadTest;

LOAD DATA LOCAL INFILE 'test.txt' INTO TABLE loadTest (@b) 
  SET b = CAST(CONV(TRIM('''' FROM SUBSTRING_INDEX(@b, 'b', -1)), 2, 10) AS UNSIGNED);

SELECT BIN(b) FROM loadTest;
/*-------+
| BIN(b) |
+--------+
| 101010 |
| 111000 |
+-------*/

Deconstruction:

SUBSTRING_INDEX(@b, 'b', -1)

removes the leading 'b' or '0b'.

TRIM('''' FROM ...)

removes the quotes if there are any.

CONV(..., 2, 10)

converts the string of 0s and 1s into a string of decimal digits.

CAST(... AS UNSIGNED)

turns the string of decimal digits into an integer.
MySQL automatically casts integers into bits when inserting into a BIT type.

Filed under: MySQL, One liners No Comments
16Mar/170

Quiz: Drop non-global users

Somebody asked on Freenode. I don't know why they wanted it. How would you drop all MySQL users who do not have "GRANT ALL ON *.* ... WITH GRANT OPTION"? That is, drop any users who have 'N' in any of the privilege columns in `mysql`.`user`.

My solution shown below. Did you think of a different approach?

My solution ▼

Filed under: MySQL, One liners No Comments
20Sep/161

Debugging Large Data with Rewriter

A customer showed that a particular client reported a less-than-helpful error message when it tried to display some meta-data about a table.

A less-than-helpful error message that hints at an int.

I couldn't repeat the behavior with just a copy of the schema, so I suspected it was because of the size of data in the customer's server - somebody had used an int where they needed a long.

The customer's data was quite large - many hundreds of GB - more than I could easily whip up on my laptop to test. But, I didn't really need all that data, or even any data at all; I just needed MySQL to pretend it had all that data. Specifically, I needed information_schema to report a large data_length.

Enter Rewriter, the query rewrite plugin that ships with MySQL 5.7 and later.

First, the general query log gave the exact query sent by the client:

select * FROM information_schema.partitions WHERE TABLE_SCHEMA = 'schemaNameHere' AND TABLE_NAME = 'tableNameHere'

Create a copy of that table:

CREATE DATABASE debug_schema;
CREATE TABLE debug_schema.partitions LIKE information_schema.partitions;
INSERT INTO debug_schema.partitions SELECT * FROM information_schema.partitions;

Adjust the copy to report a different value:

UPDATE debug_schema.partitions SET data_length = POW(2, 32);

Install the Rewriter plugin, and add a rule to modify the schema name:

INSERT INTO query_rewrite.rewrite_rules(pattern, replacement) VALUES (
'select * FROM information_schema.partitions WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?',
'select * FROM debug_schema.partitions WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ?'
);

CALL query_rewrite.flush_rewrite_rules();

Now when I try the client, it thinks information_schema has reported a huge data_length, and I get the same error which my customer had reported. Success!

Filed under: MySQL 1 Comment
8Sep/161

Aggregate JSON function in MySQL

There is not yet an equivalent to GROUP_CONCAT that produces a JSON array. (There is in MySQL 8, but that's not GA yet.) Until then, you can hack it together with string functions:

SELECT * FROM t;
+------+--------+
| id   | data   |
+------+--------+
|    1 | First  |
|    2 | Second |
+------+--------+

SELECT CONCAT('[', GROUP_CONCAT(JSON_OBJECT('id', id, 'value', data) SEPARATOR ', '), ']') AS j FROM t;
+-------------------------------------------------------------+
| j                                                           |
+-------------------------------------------------------------+
| [{"id": 1, "value": "First"}, {"id": 2, "value": "Second"}] |
+-------------------------------------------------------------+

Or you can use all JSON functions but hack the grouping:

SELECT j FROM (
       SELECT
         @c := @c + 1 AS c,
         @j := JSON_MERGE(@j, JSON_OBJECT('id', id, 'value', data)) AS j
       FROM t
       JOIN (SELECT @c := 0,  @j := JSON_ARRAY()) dt1
     ) dt2 ORDER BY c DESC LIMIT 1;
+-------------------------------------------------------------+
| j                                                           |
+-------------------------------------------------------------+
| [{"id": 1, "value": "First"}, {"id": 2, "value": "Second"}] |
+-------------------------------------------------------------+
Filed under: MySQL FAQ 1 Comment
3Nov/151

I’m really quite good with maps

Workbench announced support for a spatial view in 6.2, but examples are somewhat lacking. Just how do you get a SHP into MySQL?

worldmap

Download and unpack a SHP file such as these country boundaries.

In the Workbench installation directory, you'll find a program "ogr2ogr" that can convert .shp to .csv. Run it like this:

"C:\Program Files\MySQL\MySQL Workbench 6.3\ogr2ogr.exe" -f CSV countries.csv countries.shp -lco GEOMETRY=AS_WKT

Now create a table and load the CSV.

CREATE TABLE worldmap (
	OBJECTID smallint unsigned,
	NAME varchar(50),
	ISO3 char(3),
	ISO2 char(2),
	FIPS varchar(5),
	COUNTRY varchar(50),
	ENGLISH varchar(50),
	FRENCH varchar(50),
	SPANISH varchar(50),
	LOCAL varchar(50),
	FAO varchar(50),
	WAS_ISO varchar(3),
	SOVEREIGN varchar(50),
	CONTINENT varchar(15),
	UNREG1 varchar(30),
	UNREG2 varchar(15),
	EU boolean,
	SQKM decimal(20,11),
	g geometry
);

LOAD DATA LOCAL INFILE 'countries.csv' INTO TABLE worldmap FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' IGNORE 1 LINES
(@WKT, OBJECTID, NAME, ISO3, ISO2, FIPS, COUNTRY, ENGLISH, FRENCH, SPANISH, LOCAL, FAO, WAS_ISO, SOVEREIGN, CONTINENT, UNREG1, UNREG2, EU, SQKM)
SET g = ST_GeomCollFromText(@WKT);

Now just select rows of interest in Workbench, click the Spatial View format button, and there's your world map.

You can run multiple selects (such as the citylot data from yesterday's post) to overlay on top of the world map.

worldmap_overlay

Filed under: MySQL 1 Comment
2Nov/150

The world is not in your books and maps.

MySQL 5.7 came out with support for JSON, improved geometry, and virtual columns. Here's an example showing them all playing together.

click to embiggen

Download citylots.json.

It comes as one big object, so we'll break it up into separate lines:
grep "^{ .type" citylots.json > properties.json

Connect to a 5.7 instance of MySQL.

CREATE TABLE citylots (id serial, j json, p geometry as (ST_GeomFromGeoJSON(j, 2)));
LOAD DATA LOCAL INFILE 'properties.json' INTO TABLE citylots (j);

A few of the rows don't contain useful data:
DELETE FROM citylots WHERE j->'$.geometry.type' IS NULL;

In MySQL Workbench, do:
SELECT id, p FROM citylots;

Then click on Spatial View. It takes a couple of minutes for 200k rows, but there's a map of San Francisco.

The default projection, 'Robinson', is designed for showing the whole world at once and so is pretty distorted for this particular data set. Mercator or Equirectangular are better choices. Fortunately, Workbench repaints the data in just a few seconds.

If you selected some other fields, you can click on the map and see the relevant data for that particular geometry.

Filed under: MySQL No Comments
9Oct/151

Quarto

mysql < quarto.sql

Example game play:

mysql> -- Start the game and pass the first piece in the lower nibble
mysql> CALL SetupGame(0x0A);
-----------------------------------------------------------------------+
| rules                                                                |
-----------------------------------------------------------------------+
| Quarto: 4 in a line (row, column, or long diagonal) with at least one bit in common wins.
CALL Play(move); -- high 4 bits are board position, low 4 bits are piece for next player
CALL PrintBoard(base); -- to display the board. Useful bases are 16 and 2. |
-----------------------------------------------------------------------+

+------------------------+
| instructions           |
+------------------------+
| Player 1, play piece A |
+------------------------+

mysql> CALL Play(0x00);
+---------+
| board   |
+---------+
| A| | |  |
|  | | |  |
|  | | |  |
|  | | |  |
+---------+

+------------------------+
| instructions           |
+------------------------+
| Player 0, play piece 0 |
+------------------------+

mysql> CALL Play(0x1C);
+---------+
| board   |
+---------+
| A|0| |  |
|  | | |  |
|  | | |  |
|  | | |  |
+---------+

+------------------------+
| instructions           |
+------------------------+
| Player 1, play piece C |
+------------------------+

mysql> CALL Play(0x2E);
+---------+
| board   |
+---------+
| A|0|C|  |
|  | | |  |
|  | | |  |
|  | | |  |
+---------+

+------------------------+
| instructions           |
+------------------------+
| Player 0, play piece E |
+------------------------+

mysql> CALL Play(0x3F);
+---------+
| board   |
+---------+
| A|0|C|E |
|  | | |  |
|  | | |  |
|  | | |  |
+---------+

+---------------+
| result        |
+---------------+
| player 0 wins |
+---------------+
Filed under: MySQL 1 Comment
7Oct/150

DATE_TRUNC for MySQL

Because somebody asked for it on Freenode:

CREATE FUNCTION DATE_TRUNC(field ENUM('microsecond', 'millisecond', 'second', 'minute', 'hour', 'day', 'week', 'month', 'quarter', 'year', 'decade', 'century', 'millennium'), source datetime(6))
RETURNS datetime(6)
DETERMINISTIC
BEGIN
  IF field IN ('millisecond') THEN SET source = source - INTERVAL MICROSECOND(source) % 1000 MICROSECOND; END IF;
  IF field NOT IN ('microsecond', 'millisecond') THEN SET source = source - INTERVAL MICROSECOND(source) MICROSECOND; END IF;
  IF field NOT IN ('microsecond', 'millisecond', 'second') THEN SET source = source - INTERVAL SECOND(source) SECOND; END IF;
  IF field NOT IN ('microsecond', 'millisecond', 'second', 'minute') THEN SET source = source - INTERVAL MINUTE(source) MINUTE; END IF;
  IF field NOT IN ('microsecond', 'millisecond', 'second', 'minute', 'hour') THEN SET source = source - INTERVAL HOUR(source) HOUR; END IF;
  IF field NOT IN ('microsecond', 'millisecond', 'second', 'minute', 'hour', 'day') THEN SET source = source - INTERVAL DAYOFWEEK(source) - 1 DAY; END IF;
  IF field NOT IN ('microsecond', 'millisecond', 'second', 'minute', 'hour', 'day', 'week') THEN SET source = source - INTERVAL DAY(source) - 1 DAY; END IF;
  IF field IN ('quarter') THEN SET source = source - INTERVAL MONTH(source) % 3 - 1 MONTH; END IF;
  IF field NOT IN ('microsecond', 'millisecond', 'second', 'minute', 'hour', 'week', 'day', 'month', 'quarter') THEN SET source = source - INTERVAL MONTH(source) - 1 MONTH; END IF;

  -- Year ranges go from 1 - 10, e.g. 1961-1970, not 1960-1969. The third millenium started 2001, not 2000. If you want it the other way, remove the "- 1" from each of the following.
  IF field IN ('decade') THEN SET source = source - INTERVAL YEAR(source) % 10 - 1 YEAR; END IF;
  IF field IN ('century') THEN SET source = source - INTERVAL YEAR(source) % 100  - 1 YEAR; END IF;
  IF field IN ('millennium') THEN SET source = source - INTERVAL YEAR(source) % 1000 - 1 YEAR; END IF;
 
  RETURN source;
END

When called with the date '1996-02-29 12:28:53.123456', returns the following:

FIELD Returned value
MICROSECOND 1996-02-29 12:28:53.123456
MILLISECOND 1996-02-29 12:28:53.123000
SECOND 1996-02-29 12:28:53.000000
MINUTE 1996-02-29 12:28:00.000000
HOUR 1996-02-29 12:00:00.000000
DAY 1996-02-29 00:00:00.000000
WEEK 1996-02-25 00:00:00.000000
MONTH 1996-02-01 00:00:00.000000
QUARTER 1996-01-01 00:00:00.000000
YEAR 1996-01-01 00:00:00.000000
DECADE 1991-01-01 00:00:00.000000
CENTURY 1901-01-01 00:00:00.000000
MILLENNIUM 1001-01-01 00:00:00.000000
Filed under: MySQL No Comments