A Little Noise

December 4, 2009

On unions and derivatives

Filed under: MySQL — snoyes @ 4:25 pm

On occasion I need to generate a number of rows. I don’t care about the contents, just how many rows there are. Maybe I want to create a table filled with a sequence of numbers. One common way to do this is to join a bunch of subqueries built out of unions:


SET @count := 0;
INSERT INTO table SELECT @count := @count + 1
FROM (SELECT 1 UNION SELECT 2) a
JOIN (SELECT 1 UNION SELECT 2) b
JOIN (SELECT 1 UNION SELECT 2) c

Of course, I could union more than two selects (e.g. SELECT 1 UNION SELECT 2 UNION SELECT 3). That makes each derived table longer, but reduces how many must be joined.

Question: What’s the optimal size of each derived table?

Defining “optimal”: since this is a one-off query, I don’t much care about performance (within reason), but I do care about how much typing I have to do. So I will favor “fewer characters” over “fewer joins”.

Each derived table in the query takes up 15 * numSelects + 2 characters.

To generate a given number of rows requires CEILING(LOG(numRows) / LOG(numSelects)) derived tables.

If I want 1000 rows in the result, I’d need 10 derived tables of two rows each (320 characters), or 7 of three rows (329 characters), or 3 of ten rows (456 characters).

For any given numRows, what’s the minimum of CEILING(LOG(numRows) / LOG(numSelects)) * (15 * numSelects + 2) ?

The answer varies with numRows – 2, 3, or 4 are alternately best when generating less than 130 rows, but any higher than that and the best choice usually turns out to be 3.

(SELECT 1 UNION SELECT 2 UNION SELECT 3)

3 Comments »

  1. […] than that and the best choice usually turns out to be 3. (SELECT 1 UNION SELECT 2 UNION SELECT 3) Read the full article the source. Share Code mysql How to tell if someone is bullshitting Eben Moglen, Lazarus […]

    Pingback by On unions and derivatives | Web Developer — December 5, 2009 @ 6:48 am

  2. Hi,

    I’ve noticed that after a few dozen UNION operations, things get *really* slow. In fact, just to open the table definition can take long seconds.

    Kindly take a look at my post: Generating numbers out of seemingly thin air, where I utilize mysql/INFORMATION_SCHEMA tables to generate running sequences

    Regards

    Comment by Shlomi Noach — December 5, 2009 @ 11:49 pm

  3. I’ll just bet that if we did the math and allowed for non-integer numbers of unions, the answer is e (~2.7182).

    Comment by snoyes — February 23, 2010 @ 12:04 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress