Multi-master replication
Implement multi-master to single-slave replication by periodically saving state and changing masters, and do it all in SQL.*
Just a proof of concept. I suggest you add some error checking (like, don't make the switch if the slave isn't running).
Remember that all this stuff goes on the SLAVE. If you absent-mindedly log in to the master, like I did a few times, you might wonder why your relay_log.info file isn't read.
Set up a table to keep track of each master file. Mine use the same user name and password for all hosts; you can add those fields and adjust the procedure accordingly if you like.
USE mysql;
DROP TABLE IF EXISTS rotate_master;
CREATE TABLE rotate_master (
id int auto_increment primary key,
master_host varchar(255),
master_port int unsigned,
master_log_file varchar(255),
master_log_pos int unsigned,
in_use boolean DEFAULT 0
);
Insert the list of masters. The one that is currently master should have `in_use` set to 1; all the others should be 0. The order you insert them is the order they will rotate.
INSERT INTO rotate_master VALUES (NULL, 'localhost', 3306, 'bin.000001', 0, 1);
INSERT INTO rotate_master VALUES (NULL, 'localhost', 3308, 'bin.000001', 0, 0);
Then, a procedure to grab the current position (by reading from the relay_log.info file, since the slave position is not otherwise exposed in any accessible manner), record it in the above table, and change masters to the next one in the list.
DROP PROCEDURE IF EXISTS rotate_master;
DELIMITER //
CREATE PROCEDURE rotate_master()
BEGIN
DECLARE _info text;
DECLARE _master_file varchar(255);
DECLARE _master_pos int unsigned;
DECLARE _master_host varchar(255);
DECLARE _master_port int unsigned;
DECLARE _id int;
STOP SLAVE;
-- fetch and store current position
SELECT LOAD_FILE(@@relay_log_info_file) INTO _info;
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(_info, '\n', 3), '\n', -1),
SUBSTRING_INDEX(SUBSTRING_INDEX(_info, '\n', 4), '\n', -1)
INTO _master_file, _master_pos;
UPDATE mysql.rotate_master SET master_log_file = _master_file, master_log_pos = _master_pos, id = LAST_INSERT_ID(id) WHERE in_use = 1;
-- fetch next host
SELECT
id,
master_host,
master_port,
master_log_file,
master_log_pos
INTO _id, _master_host, _master_port, _master_file, _master_pos
FROM rotate_master
ORDER BY id <= LAST_INSERT_ID(), id LIMIT 1;
-- advance to next host
SET @sql := CONCAT(
'CHANGE MASTER TO master_host=', QUOTE(_master_host),
', master_port=', _master_port,
', master_log_file=', QUOTE(_master_file),
', master_log_pos=', _master_pos);
PREPARE myStmt FROM @sql;
EXECUTE myStmt;
-- mark host as changed
UPDATE mysql.rotate_master SET in_use = 0 WHERE in_use = 1;
UPDATE mysql.rotate_master SET in_use = 1 WHERE id = _id;
START SLAVE;
END//
DELIMITER ;
Finally, schedule it to rotate. I used 10 seconds just to play with; I imagine a few minutes would be more practical. *If you don't have 5.1 or later, you'll have to schedule this part externally, with cron or task scheduler.
DROP EVENT IF EXISTS rotate_master;
CREATE EVENT rotate_master
ON SCHEDULE EVERY 10 SECOND
DO CALL mysql.rotate_master();
LOAD DATA and recovery
A little two-part quiz. If you get the first one without peeking, you're worth your pay as a DBA. If you get the second one without peeking, you may tell your boss that some random guy on the Internet says you deserve a raise.
Start with a text file, 'test.txt', with these three lines:
1
1
2
Set up the test in MySQL:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (id int primary key);
LOAD DATA INFILE 'test.txt' INTO TABLE t1;
Restore from mysqldump --all-databases in parallel
csplit backup.sql /"^-- Current Database"/ {*}; for f in xx*; do cat xx00 "$f" | mysql& done;
and then rm xx* when the background jobs have finished.
I haven't done any benchmarking, and I haven't worked out what to do if the number of databases exceeds the number of available connections.
mysql’s --xml and some XSLT
Somebody asked in Freenode the other day how to get their data out of MySQL into a specific XML format.
Both mysqldump and the mysql client have --xml options. The output from the cli looks like this:
<resultset statement="SELECT here">
<row>
<field name="field1">value</field>
<field name="field2">value</field>
</row>
<row>
<field name="field1">value</field>
<field name="field2">value</field>
</row>
</resultset>
I wanted to change those <field name="fieldName"> into <fieldName> tags, and all it takes is a little XSLT, which I've uploaded here.
On Linux you can do something like this:
mysql -e "SELECT * FROM dbname.tablename" | xsltproc mysql.xsl -
Make an ALTER TABLE script with MySQL Workbench
So you have two versions of a database sitting around - maybe you've been trying things out on your development box, and now you want to move it all over to production. If it's just a new table here and an extra column there, you could do it by hand, but if you've made lots of changes, it would be nice to let the computer do it for you.
Enter MySQL Workbench.
Trigger on firing table
CREATE TABLE t1 (id int); CREATE TRIGGER t1_ai AFTER INSERT ON t1 FOR EACH ROW INSERT INTO t1 VALUES (NEW.id);
That's not allowed; it won't let you create the trigger.
CREATE TABLE t1 (id int); CREATE TABLE f1 (id int) ENGINE=FEDERATED CONNECTION='localhost/test/t1'; CREATE TRIGGER t1_ai AFTER INSERT ON t1 FOR EACH ROW INSERT INTO f1 VALUES (NEW.id);
The server lets that through. Don't do it.
On unions and derivatives
On occasion I need to generate a number of rows. I don't care about the contents, just how many rows there are. Maybe I want to create a table filled with a sequence of numbers. One common way to do this is to join a bunch of subqueries built out of unions:
SET @count := 0;
INSERT INTO table SELECT @count := @count + 1
FROM (SELECT 1 UNION SELECT 2) a
JOIN (SELECT 1 UNION SELECT 2) b
JOIN (SELECT 1 UNION SELECT 2) c
...
Of course, I could union more than two selects (e.g. SELECT 1 UNION SELECT 2 UNION SELECT 3). That makes each derived table longer, but reduces how many must be joined.
Question: What's the optimal size of each derived table?
Defining "optimal": since this is a one-off query, I don't much care about performance (within reason), but I do care about how much typing I have to do. So I will favor "fewer characters" over "fewer joins".
Each derived table in the query takes up 15 * numSelects + 2 characters.
To generate a given number of rows requires CEILING(LOG(numRows) / LOG(numSelects)) derived tables.
If I want 1000 rows in the result, I'd need 10 derived tables of two rows each (320 characters), or 7 of three rows (329 characters), or 3 of ten rows (456 characters).
For any given numRows, what's the minimum of CEILING(LOG(numRows) / LOG(numSelects)) * (15 * numSelects + 2) ?
The answer varies with numRows - 2, 3, or 4 are alternately best when generating less than 130 rows, but any higher than that and the best choice usually turns out to be 3.
(SELECT 1 UNION SELECT 2 UNION SELECT 3)
group date
A non-rigorous, non-scientific, totally off-the-cuff test of which function to pick when you need to group by year and month.
I populated a table with 262K rows of random dates, and then ran
SELECT %s, COUNT(*) FROM table GROUP BY %s ORDER BY NULL
with various functions, which should all result in the same grouping. I repeated each query five times and show the average time, using three different column types DATE, DATETIME, and TIMESTAMP.
| expression | DATE | DATETIME | TIMESTAMP |
|---|---|---|---|
| EXTRACT(YEAR_MONTH FROM d) | 0.362 | 0.369 | 0.581 |
| LAST_DAY(d) | 0.374 | 0.389 | 0.582 |
| DATE_SUB(d, INTERVAL DAY(d) DAY) | 0.429 | 0.882 | 1.452 |
| d - INTERVAL DAY(d) DAY | 0.429 | 0.887 | 1.535 |
| SUBSTRING(d, 1, 7) | 0.454 | 0.488 | 0.719 |
| YEAR(d), MONTH(d) | 1.046 | 1.126 | 2.045 |
| MONTH(d), YEAR(d) | 1.116 | 1.196 | 2.112 |
| LEFT(d, 7) | 1.307 | 1.405 | 2.123 |
| DATE_FORMAT(d, '%Y%m') | 1.480 | 1.565 | 2.112 |
| DATE_FORMAT(d, '%Y-%m') | 1.514 | 1.615 | 2.564 |
| DATE_FORMAT(d, '%m%Y') | 1.517 | 1.604 | 2.420 |
| DATE_FORMAT(d, '%m-%Y') | 1.562 | 1.656 | 2.465 |
| MONTHNAME(d), YEAR(d) | 1.613 | 1.713 | 2.812 |
| YEAR(d), MONTHNAME(d) | 1.663 | 1.766 | 2.873 |
And just in case you want to extract a full date (which really only makes sense for datetime and timestamp):
| expression | DATE | DATETIME | TIMESTAMP |
|---|---|---|---|
| DATE(d) | 0.357 | 0.374 | 0.591 |
| EXTRACT(YEAR_MONTH FROM d), EXTRACT(DAY FROM d) | 0.377 | 0.407 | 0.730 |
| EXTRACT(DAY FROM d), EXTRACT(YEAR_MONTH FROM d) | 0.395 | 0.422 | 0.751 |
| MONTH(d), YEAR(d), DAY(d) | 0.395 | 0.426 | 0.870 |
| YEAR(d), DAY(d), MONTH(d) | 0.398 | 0.440 | 0.862 |
| YEAR(d), MONTH(d), DAY(d) | 0.406 | 0.441 | 0.867 |
| DAY(d), YEAR(d), MONTH(d) | 0.409 | 0.444 | 0.859 |
| LEFT(d, 10) | 0.437 | 0.487 | 0.728 |
| SUBSTRING_INDEX(d, ' ', 1) | 0.439 | 0.475 | 0.743 |
| DAY(d), MONTH(d), YEAR(d) | 0.441 | 0.477 | 0.901 |
| MONTH(d), DAY(d), YEAR(d) | 0.442 | 0.472 | 0.914 |
| SUBSTRING(d, 1, 10) | 0.460 | 0.496 | 0.729 |
| DATE_FORMAT(d, '%d%Y%m') | 0.539 | 0.571 | 0.852 |
| DATE_FORMAT(d, '%m%Y%d') | 0.542 | 0.570 | 0.841 |
| DATE_FORMAT(d, '%m%d%Y') | 0.543 | 0.572 | 0.846 |
| DATE_FORMAT(d, '%Y%m%d') | 0.544 | 0.570 | 0.842 |
| DATE_FORMAT(d, '%Y%d%m') | 0.544 | 0.572 | 0.843 |
| DATE_FORMAT(d, '%d-%Y-%d') | 0.547 | 0.574 | 0.848 |
| DATE_FORMAT(d, '%d%m%Y') | 0.549 | 0.573 | 0.842 |
| DATE_FORMAT(d, '%m-%Y-%d') | 0.552 | 0.583 | 0.869 |
| DATE_FORMAT(d, '%Y-%m-%d') | 0.558 | 0.583 | 0.854 |
| DATE_FORMAT(d, '%Y-%d-%m') | 0.569 | 0.593 | 0.867 |
| d - INTERVAL HOUR(d) HOUR - INTERVAL MINUTE(d) MINUTE - INTERVAL SECOND(d) SECOND | 0.573 | 0.653 | 1.249 |
| DATE_FORMAT(d, '%d-%m-%Y') | 0.598 | 0.637 | 0.920 |
| DATE_FORMAT(d, '%m-%d-%Y') | 0.601 | 0.633 | 0.904 |
GROUP UNCONCAT
Dunno why you'd rather do this in SQL than in your application layer, but if you do, here's one way to turn a delimited string of values back into multiple rows - just the opposite of GROUP_CONCAT:
SET @sourceString = 'a,b,c,d,e';
SET @sql = CONCAT('INSERT INTO t VALUES (\\'', REPLACE(@sourceString, ',', '\\'),(\\''), '\\')');
PREPARE myStmt FROM @sql;
EXECUTE myStmt;
Just to show what's going on:
mysql> SELECT @sql;
+----------------------------------------------------+
| @sql |
+----------------------------------------------------+
| INSERT INTO t VALUES ('a'),('b'),('c'),('d'),('e') |
+----------------------------------------------------+
Fine-tuning FULLTEXT: Adding word characters
Perhaps you want "." or ":" or "-" or something to be treated as a word character so that you can do a fulltext match against an IP or MAC address.
If you want to change the set of characters that are considered word characters, you can do so in two ways...
* Modify the MySQL source ... and recompile MySQL.
* Modify a character set file: This requires no recompilation.
Thus sayeth the Manual. What it doesn't say is just what to modify in that character set file. I had to ask MySQL developer Alexander Barkov for help to get it right.
In the user comments on that manual page, John Navratil has provided most of the answer, but there's an important step missing: adding a new collation.