Month: December 2009

  • On unions and derivatives

    On occasion I need to generate a number of rows. I don’t care about the contents, just how many rows there are. Maybe I want to create a table filled with a sequence of numbers. One common way to do this is to join a bunch of subqueries built out of unions:


    SET @count := 0;
    INSERT INTO table SELECT @count := @count + 1
    FROM (SELECT 1 UNION SELECT 2) a
    JOIN (SELECT 1 UNION SELECT 2) b
    JOIN (SELECT 1 UNION SELECT 2) c

    Of course, I could union more than two selects (e.g. SELECT 1 UNION SELECT 2 UNION SELECT 3). That makes each derived table longer, but reduces how many must be joined.

    Question: What’s the optimal size of each derived table?

    Defining “optimal”: since this is a one-off query, I don’t much care about performance (within reason), but I do care about how much typing I have to do. So I will favor “fewer characters” over “fewer joins”.

    Each derived table in the query takes up 15 * numSelects + 2 characters.

    To generate a given number of rows requires CEILING(LOG(numRows) / LOG(numSelects)) derived tables.

    If I want 1000 rows in the result, I’d need 10 derived tables of two rows each (320 characters), or 7 of three rows (329 characters), or 3 of ten rows (456 characters).

    For any given numRows, what’s the minimum of CEILING(LOG(numRows) / LOG(numSelects)) * (15 * numSelects + 2) ?

    The answer varies with numRows – 2, 3, or 4 are alternately best when generating less than 130 rows, but any higher than that and the best choice usually turns out to be 3.

    (SELECT 1 UNION SELECT 2 UNION SELECT 3)