Quiz: A More Perfect UNION

Today I saw a query like this:
SELECT d FROM t;

Performance was terrible. I ran out of patience after several minutes and killed the thread.

I changed the query to this:
(SELECT d FROM t) UNION ALL (SELECT NULL LIMIT 0);

It completed in under 3 seconds.

Can you explain how a no-op UNION so dramatically improved performance? (I couldn’t have, without help from Jesper Krogh and James Day).

http://dev.mysql.com/doc/refman/5.5/en/cursor-restrictions.html
The field `d` is a varchar and is bigger than it needs to be.
http://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.html
http://dev.mysql.com/doc/refman/5.5/en/internal-temporary-tables.html
The query is in a cursor. MySQL materializes cursors to a temporary table, using the MEMORY engine if it can, then converting to MyISAM if it gets too big. The MEMORY engine doesn’t support variable length rows, so VARCHARs are converted to CHARs. The field `d` only contains one character in each row, but is defined as a VARCHAR(20000) – which means the temporary table will grow very large. Even when converted to MyISAM, it retains that fixed width format, so the temp table is huge.

The “presence of any column larger than 512 bytes in the SELECT list, if UNION or UNION ALL is used,” prevents use of an in-memory temporary table. That means the temporary table will be created directly as MyISAM on the disk, and will keep the VARCHAR format, so it will be much smaller. Converting the field to TEXT would have the same effect.

Comments

5 responses to “Quiz: A More Perfect UNION”

  1. […] PlanetMySQL Voting: Vote UP / Vote DOWN Original Source: Planet MySQL […]

  2. David Avatar
    David

    Simple answer: Caches

    If it is innodb, your first one loaded the table into the buffer pool. If not, most likely, you have disk caches, os caches, mysql caches, et al. These would have been populated the caches between the data and you.

  3. Scott Noyes Avatar
    Scott Noyes

    Caches are not relevant here. I can run these two versions in either order, multiple times, and performance is the same as initially stated.

  4. Shlomi Noach Avatar

    I’m guessing, as there is not much info:
    The UNION ALL query will first write your data into temporary table; only then will the data return to you.
    This means that if your result set is huge, and your network is slow, query #1 suffers from the time it takes to send the results over to you; query #2 does not, since the temporary table releases the table.
    Can you try SELECT SQL_BUFFER_RESULT d FROM t; to see if this is indeed the case?

    Otherwise more is needed; myisam? innodb? write contention? size?

  5. Scott Noyes Avatar
    Scott Noyes

    Not a network issue; this is on localhost.

    Storage engine of table `t` can be either MyISAM or InnoDB, and we’ll get the same results. But storage engines and temporary tables are nevertheless key to the quiz!

    Size is important, but size of what? For my test, I had a million rows in `t`, but I can reduce that to 65K, and the first approach still takes over a minute, while the second approach with the UNION takes a quarter of a second.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.