Today I saw a query like this:
SELECT d FROM t;
Performance was terrible. I ran out of patience after several minutes and killed the thread.
I changed the query to this:
(SELECT d FROM t) UNION ALL (SELECT NULL LIMIT 0);
It completed in under 3 seconds.
Can you explain how a no-op UNION so dramatically improved performance? (I couldn’t have, without help from Jesper Krogh and James Day).
Hint #1 ▼
http://dev.mysql.com/doc/refman/5.5/en/cursor-restrictions.html
Hint #2 ▼
The field `d` is a varchar and is bigger than it needs to be.
Hint #3 ▼
http://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.html
Hint #4 ▼
http://dev.mysql.com/doc/refman/5.5/en/internal-temporary-tables.html
Answer ▼
The query is in a cursor. MySQL materializes cursors to a temporary table, using the MEMORY engine if it can, then converting to MyISAM if it gets too big. The MEMORY engine doesn’t support variable length rows, so VARCHARs are converted to CHARs. The field `d` only contains one character in each row, but is defined as a VARCHAR(20000) – which means the temporary table will grow very large. Even when converted to MyISAM, it retains that fixed width format, so the temp table is huge.
The “presence of any column larger than 512 bytes in the SELECT list, if UNION or UNION ALL is used,” prevents use of an in-memory temporary table. That means the temporary table will be created directly as MyISAM on the disk, and will keep the VARCHAR format, so it will be much smaller. Converting the field to TEXT would have the same effect.
Related
[…] PlanetMySQL Voting: Vote UP / Vote DOWN Original Source: Planet MySQL […]
Pingback by Quiz: A More Perfect UNION « Fast Ninja Blog by Freelanceful – Web Design | Coding | Freelancing — November 9, 2011 @ 4:20 pm
Simple answer: Caches
If it is innodb, your first one loaded the table into the buffer pool. If not, most likely, you have disk caches, os caches, mysql caches, et al. These would have been populated the caches between the data and you.
Comment by David — November 9, 2011 @ 4:38 pm
Caches are not relevant here. I can run these two versions in either order, multiple times, and performance is the same as initially stated.
Comment by Scott Noyes — November 9, 2011 @ 6:44 pm
I’m guessing, as there is not much info:
The UNION ALL query will first write your data into temporary table; only then will the data return to you.
This means that if your result set is huge, and your network is slow, query #1 suffers from the time it takes to send the results over to you; query #2 does not, since the temporary table releases the table.
Can you try SELECT SQL_BUFFER_RESULT d FROM t; to see if this is indeed the case?
Otherwise more is needed; myisam? innodb? write contention? size?
Comment by Shlomi Noach — November 9, 2011 @ 10:09 pm
Not a network issue; this is on localhost.
Storage engine of table `t` can be either MyISAM or InnoDB, and we’ll get the same results. But storage engines and temporary tables are nevertheless key to the quiz!
Size is important, but size of what? For my test, I had a million rows in `t`, but I can reduce that to 65K, and the first approach still takes over a minute, while the second approach with the UNION takes a quarter of a second.
Comment by Scott Noyes — November 9, 2011 @ 11:47 pm