Category: Technical

Anything that uses 1s and 0s

UNION and ORDER and LIMIT

From a discussion on Freenode (and don’t feel bad if you don’t get it; I had to ask Sinisa and Monty to explain it to me. Sinisa explained how it worked, and Monty found a bug.)–

USE test; DROP TABLE IF EXISTS `t`; CREATE TABLE `t` ( `a` int(11) DEFAULT NULL );
INSERT INTO t VALUES (RAND() * 50); INSERT INTO t SELECT RAND() * 50 FROM t;
Repeat that last INSERT a few times until you have about 1000 rows.

What would you expect from the following?

SELECT * FROM t WHERE false UNION SELECT * FROM t ORDER BY a LIMIT 19, 1; (SELECT * FROM t WHERE false) UNION (SELECT * FROM t ORDER BY a LIMIT 19, 1); (SELECT * FROM t WHERE false) UNION (SELECT * FROM t ORDER BY a) LIMIT 19, 1;

Two things to remember:
– UNION is equivalent to UNION DISTINCT
– an ORDER BY with no LIMIT is optimized away in UNIONed queries

The first query:

SELECT * FROM t WHERE false UNION SELECT * FROM t ORDER BY a LIMIT 19, 1;

is equivalent to:

(SELECT * FROM t WHERE false UNION SELECT * FROM t) ORDER BY a LIMIT 19, 1;

Since the UNION is by default DISTINCT, get each distinct value from the table, then order by `a`. Take the row at offset 19. Assuming even distribution of our random data, your value should be ’19’.

The second query:

(SELECT * FROM t WHERE false) UNION (SELECT * FROM t ORDER BY a LIMIT 19, 1);

Order all the rows by `a`. That’s roughly 20 rows of value ‘1’, then 20 rows of value ‘2’, then 20 rows of value ‘3’, etc. Take the row at offset 19. Assuming even distribution of our random data, your value should be ‘1’.

The third query:

(SELECT * FROM t WHERE false) UNION (SELECT * FROM t ORDER BY a) LIMIT 19, 1;

The ORDER BY is discarded. We get distinct values in random order. Take the row at offset 19. Since the values are in random order, the value is not predictable.

March 9, 2011
Multi-master replication

Implement multi-master to single-slave replication by periodically saving state and changing masters, and do it all in SQL.*

Just a proof of concept. I suggest you add some error checking (like, don’t make the switch if the slave isn’t running).

Remember that all this stuff goes on the SLAVE. If you absent-mindedly log in to the master, like I did a few times, you might wonder why your relay_log.info file isn’t read.

Set up a table to keep track of each master file. Mine use the same user name and password for all hosts; you can add those fields and adjust the procedure accordingly if you like.

USE mysql; DROP TABLE IF EXISTS rotate_master; CREATE TABLE rotate_master ( id int auto_increment primary key, master_host varchar(255), master_port int unsigned, master_log_file varchar(255), master_log_pos int unsigned, in_use boolean DEFAULT 0 );

Insert the list of masters. The one that is currently master should have `in_use` set to 1; all the others should be 0. The order you insert them is the order they will rotate.

INSERT INTO rotate_master VALUES (NULL, 'localhost', 3306, 'bin.000001', 0, 1); INSERT INTO rotate_master VALUES (NULL, 'localhost', 3308, 'bin.000001', 0, 0);

Then, a procedure to grab the current position (by reading from the relay_log.info file, since the slave position is not otherwise exposed in any accessible manner), record it in the above table, and change masters to the next one in the list.

DROP PROCEDURE IF EXISTS rotate_master; DELIMITER //
CREATE PROCEDURE rotate_master() BEGIN DECLARE _info text; DECLARE _master_file varchar(255); DECLARE _master_pos int unsigned; DECLARE _master_host varchar(255); DECLARE _master_port int unsigned; DECLARE _id int; STOP SLAVE; -- fetch and store current position SELECT LOAD_FILE(@@relay_log_info_file) INTO _info; SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(_info, '\n', 3), '\n', -1), SUBSTRING_INDEX(SUBSTRING_INDEX(_info, '\n', 4), '\n', -1) INTO _master_file, _master_pos; UPDATE mysql.rotate_master SET master_log_file = _master_file, master_log_pos = _master_pos, id = LAST_INSERT_ID(id) WHERE in_use = 1;
-- fetch next host SELECT id, master_host, master_port, master_log_file, master_log_pos INTO _id, _master_host, _master_port, _master_file, _master_pos FROM rotate_master ORDER BY id <= LAST_INSERT_ID(), id LIMIT 1; -- advance to next host SET @sql := CONCAT( 'CHANGE MASTER TO master_host=', QUOTE(_master_host), ', master_port=', _master_port, ', master_log_file=', QUOTE(_master_file), ', master_log_pos=', _master_pos); PREPARE myStmt FROM @sql; EXECUTE myStmt; -- mark host as changed UPDATE mysql.rotate_master SET in_use = 0 WHERE in_use = 1; UPDATE mysql.rotate_master SET in_use = 1 WHERE id = _id; START SLAVE; END// DELIMITER ;

Finally, schedule it to rotate. I used 10 seconds just to play with; I imagine a few minutes would be more practical. *If you don't have 5.1 or later, you'll have to schedule this part externally, with cron or task scheduler.

DROP EVENT IF EXISTS rotate_master; CREATE EVENT rotate_master ON SCHEDULE EVERY 10 SECOND DO CALL mysql.rotate_master();

August 25, 2010
LOAD DATA and recovery

A little two-part quiz. If you get the first one without peeking, you’re worth your pay as a DBA. If you get the second one without peeking, you may tell your boss that some random guy on the Internet says you deserve a raise.

Start with a text file, ‘test.txt’, with these three lines:

1 1 2

Set up the test in MySQL:

DROP TABLE IF EXISTS t1; CREATE TABLE t1 (id int primary key); LOAD DATA INFILE 'test.txt' INTO TABLE t1;

(more…)

March 24, 2010
Restore from mysqldump --all-databases in parallel

csplit backup.sql /"^-- Current Database"/ {*}; for f in xx*; do cat xx00 "$f" | mysql& done;

and then rm xx* when the background jobs have finished.

I haven’t done any benchmarking, and I haven’t worked out what to do if the number of databases exceeds the number of available connections.

February 11, 2010
mysql’s --xml and some XSLT (now with JSON!)
Somebody asked in Freenode the other day how to get their data out of MySQL into a specific XML format.

Both mysqldump and the mysql client have --xml options. The output from the cli looks like this:

<resultset statement="SELECT here"> <row> <field name="field1">value a</field> <field name="field2">value b</field> </row> <row> <field name="field1">value c</field> <field name="field2">value d</field> </row> </resultset>

I wanted to change those <field name=”fieldName”> into <fieldName> tags, and all it takes is a little XSLT, which I’ve uploaded here.

An XSLT processor lets you do something like this:

mysql --xml -e "SELECT here" | xsltproc mysql.xsl -

The result will be something like:

<resultset statement="SELECT here"> <row> <field1>value a</field1> <field2>value b</field2> </row> <row> <field1>value c</field1> <field2>value d</field2> </row> </resultset>

If JSON is your thing, you can use xml2json to take that output a step farther:

mysql --xml -e "SELECT here" | xsltproc mysql.xsl - | xsltproc xml2json.xsl -

Result (whitespace added):
```
{"resultset" : 
  {
    "statement" : "SELECT here", 
    "row" : [
      {"field1" : "value a", "field2" : "value b"},
      {"field1" : "value c", "field2" : "value d"}
    ]
  }
}
```
That will suffice until the mysql client has a –json parameter.
January 29, 2010
Make an ALTER TABLE script with MySQL Workbench

So you have two versions of a database sitting around – maybe you’ve been trying things out on your development box, and now you want to move it all over to production. If it’s just a new table here and an extra column there, you could do it by hand, but if you’ve made lots of changes, it would be nice to let the computer do it for you.

Enter ~~MySQL Workbench~~mysqldiff.

January 28, 2010

Trigger on firing table

CREATE TABLE t1 (id int);
CREATE TRIGGER t1_ai AFTER INSERT ON t1 FOR EACH ROW 
  INSERT INTO t1 VALUES (NEW.id);

That’s not allowed; it won’t let you create the trigger.

CREATE TABLE t1 (id int);
CREATE TABLE f1 (id int) ENGINE=FEDERATED CONNECTION='localhost/test/t1';
CREATE TRIGGER t1_ai AFTER INSERT ON t1 FOR EACH ROW 
  INSERT INTO f1 VALUES (NEW.id);

The server lets that through. Don’t do it.

January 26, 2010

On unions and derivatives

On occasion I need to generate a number of rows. I don’t care about the contents, just how many rows there are. Maybe I want to create a table filled with a sequence of numbers. One common way to do this is to join a bunch of subqueries built out of unions:

SET @count := 0; INSERT INTO table SELECT @count := @count + 1 FROM (SELECT 1 UNION SELECT 2) a JOIN (SELECT 1 UNION SELECT 2) b JOIN (SELECT 1 UNION SELECT 2) c
…

Of course, I could union more than two selects (e.g. SELECT 1 UNION SELECT 2 UNION SELECT 3). That makes each derived table longer, but reduces how many must be joined.

Question: What’s the optimal size of each derived table?

Defining “optimal”: since this is a one-off query, I don’t much care about performance (within reason), but I do care about how much typing I have to do. So I will favor “fewer characters” over “fewer joins”.

Each derived table in the query takes up 15 * numSelects + 2 characters.

To generate a given number of rows requires CEILING(LOG(numRows) / LOG(numSelects)) derived tables.

If I want 1000 rows in the result, I’d need 10 derived tables of two rows each (320 characters), or 7 of three rows (329 characters), or 3 of ten rows (456 characters).

For any given numRows, what’s the minimum of CEILING(LOG(numRows) / LOG(numSelects)) * (15 * numSelects + 2) ?

The answer varies with numRows – 2, 3, or 4 are alternately best when generating less than 130 rows, but any higher than that and the best choice usually turns out to be 3.

(SELECT 1 UNION SELECT 2 UNION SELECT 3)

December 4, 2009

group date

A non-rigorous, non-scientific, totally off-the-cuff test of which function to pick when you need to group by year and month.

I populated a table with 262K rows of random dates, and then ran

SELECT %s, COUNT(*) FROM table GROUP BY %s ORDER BY NULL

with various functions, which should all result in the same grouping. I repeated each query five times and show the average time, using three different column types DATE, DATETIME, and TIMESTAMP.

expression	DATE	DATETIME	TIMESTAMP
EXTRACT(YEAR_MONTH FROM d)	0.362	0.369	0.581
LAST_DAY(d)	0.374	0.389	0.582
DATE_SUB(d, INTERVAL DAY(d) DAY)	0.429	0.882	1.452
d – INTERVAL DAY(d) DAY	0.429	0.887	1.535
SUBSTRING(d, 1, 7)	0.454	0.488	0.719
YEAR(d), MONTH(d)	1.046	1.126	2.045
MONTH(d), YEAR(d)	1.116	1.196	2.112
LEFT(d, 7)	1.307	1.405	2.123
DATE_FORMAT(d, ‘%Y%m’)	1.480	1.565	2.112
DATE_FORMAT(d, ‘%Y-%m’)	1.514	1.615	2.564
DATE_FORMAT(d, ‘%m%Y’)	1.517	1.604	2.420
DATE_FORMAT(d, ‘%m-%Y’)	1.562	1.656	2.465
MONTHNAME(d), YEAR(d)	1.613	1.713	2.812
YEAR(d), MONTHNAME(d)	1.663	1.766	2.873

And just in case you want to extract a full date (which really only makes sense for datetime and timestamp):

expression	DATE	DATETIME	TIMESTAMP
DATE(d)	0.357	0.374	0.591
EXTRACT(YEAR_MONTH FROM d), EXTRACT(DAY FROM d)	0.377	0.407	0.730
EXTRACT(DAY FROM d), EXTRACT(YEAR_MONTH FROM d)	0.395	0.422	0.751
MONTH(d), YEAR(d), DAY(d)	0.395	0.426	0.870
YEAR(d), DAY(d), MONTH(d)	0.398	0.440	0.862
YEAR(d), MONTH(d), DAY(d)	0.406	0.441	0.867
DAY(d), YEAR(d), MONTH(d)	0.409	0.444	0.859
LEFT(d, 10)	0.437	0.487	0.728
SUBSTRING_INDEX(d, ‘ ‘, 1)	0.439	0.475	0.743
DAY(d), MONTH(d), YEAR(d)	0.441	0.477	0.901
MONTH(d), DAY(d), YEAR(d)	0.442	0.472	0.914
SUBSTRING(d, 1, 10)	0.460	0.496	0.729
DATE_FORMAT(d, ‘%d%Y%m’)	0.539	0.571	0.852
DATE_FORMAT(d, ‘%m%Y%d’)	0.542	0.570	0.841
DATE_FORMAT(d, ‘%m%d%Y’)	0.543	0.572	0.846
DATE_FORMAT(d, ‘%Y%m%d’)	0.544	0.570	0.842
DATE_FORMAT(d, ‘%Y%d%m’)	0.544	0.572	0.843
DATE_FORMAT(d, ‘%d-%Y-%d’)	0.547	0.574	0.848
DATE_FORMAT(d, ‘%d%m%Y’)	0.549	0.573	0.842
DATE_FORMAT(d, ‘%m-%Y-%d’)	0.552	0.583	0.869
DATE_FORMAT(d, ‘%Y-%m-%d’)	0.558	0.583	0.854
DATE_FORMAT(d, ‘%Y-%d-%m’)	0.569	0.593	0.867
d – INTERVAL HOUR(d) HOUR – INTERVAL MINUTE(d) MINUTE – INTERVAL SECOND(d) SECOND	0.573	0.653	1.249
DATE_FORMAT(d, ‘%d-%m-%Y’)	0.598	0.637	0.920
DATE_FORMAT(d, ‘%m-%d-%Y’)	0.601	0.633	0.904

November 24, 2009

GROUP UNCONCAT

Dunno why you’d rather do this in SQL than in your application layer, but if you do, here’s one way to turn a delimited string of values back into multiple rows – just the opposite of GROUP_CONCAT:

SET @sourceString = 'a,b,c,d,e';
SET @sql = CONCAT('INSERT INTO t VALUES (\\'', REPLACE(@sourceString, ',', '\\'),(\\''), '\\')');
PREPARE myStmt FROM @sql;
EXECUTE myStmt;

Just to show what’s going on:

mysql> SELECT @sql;
+----------------------------------------------------+
| @sql                                               |
+----------------------------------------------------+
| INSERT INTO t VALUES ('a'),('b'),('c'),('d'),('e') |
+----------------------------------------------------+

November 20, 2009

Category: Technical

UNION and ORDER and LIMIT

Multi-master replication

LOAD DATA and recovery

Restore from `mysqldump --all-databases` in parallel

mysql’s `--xml` and some XSLT (now with JSON!)

Make an ALTER TABLE script with MySQL Workbench

Trigger on firing table

On unions and derivatives

group date

GROUP UNCONCAT