LOAD DATA and recovery

A little two-part quiz. If you get the first one without peeking, you’re worth your pay as a DBA. If you get the second one without peeking, you may tell your boss that some random guy on the Internet says you deserve a raise.

Start with a text file, ‘test.txt’, with these three lines:

1
1
2

Set up the test in MySQL:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (id int primary key);
LOAD DATA INFILE 'test.txt' INTO TABLE t1;

This gives “ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'“, which is expected.

What’s in the table?

It depends. If the engine is MyISAM, then you’ll have one row: the first ‘1’ from the file was inserted, everything else was skipped. If the engine is InnoDB, you’ll have no rows, because the transaction would rollback. So either 1 row or 0 rows.

Now, pretend you’re setting up a slave, or there was a crash and you’re recovering from binary logs:

mysqlbinlog bin.000001 | mysql

How many rows are in t1 now?


mysql> SELECT * FROM t1;
+----+
| id |
+----+
| 1 |
| 2 |
+----+
2 rows in set (0.00 sec)

Why? The manual says, “mysqlbinlog converts LOAD DATA INFILE statements to LOAD DATA LOCAL INFILE statements,” and, “with LOCAL, the default duplicate-key handling behavior is the same as if IGNORE is specified.”

http://dev.mysql.com/doc/refman/5.1/en/mysqlbinlog.html

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

Note that a replicating slave will handle it correctly – if the master used LOCAL (and therefore IGNORE), the slave will do IGNORE. If the master did not use LOCAL or IGNORE and so got the error above, the slave will do the same, and so the data will match. So be advised: replication and mysqlbinlog | mysql may not give the same results.

Comments

3 responses to “LOAD DATA and recovery”

  1. […] LOAD DATA and recovery « A Little Noise […]

  2. Justin Swanhart Avatar

    This probably falls under the category:
    “It isn’t a bug if we document it…”

    MySQL is really bad about documenting bad behavior instead of fixing it.

  3. Patrice Levesque Avatar
    Patrice Levesque

    Wouldn’t the second answer depend on the type of replication (row/statement)?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.