A Little Noise

November 8, 2018

Window Functions with Unusual Boundaries

Filed under: MySQL — snoyes @ 12:44 pm

Somebody on Freenode wanted this:

   Source            Result
+----+------+    +----+------+
| id | x    |    | id | c    |
+----+------+    +----+------+
|  1 |    1 |    |  1 |    2 |
|  2 |    1 |    |  2 |    2 |
|  3 | NULL |    |  3 | NULL |
|  4 | NULL | -> |  4 | NULL |
|  5 |    1 |    |  5 |    1 |
|  6 | NULL |    |  6 | NULL |
|  7 |    1 |    |  7 |    3 |
|  9 |    1 |    |  9 |    3 |
| 10 |    1 |    | 10 |    3 |
+----+------+    +----+------+

The result uses the NULL values in x as boundaries of windows, and counts the number of rows within each window. I don’t know why anyone wants such a thing; it is not ours to reason why…

Anyway, the point is that you can use arbitrary expressions, even subqueries, to define your window partitions.

SELECT 
    id, 
    -- Count of rows in windows bound by NULL values in x
    IF(
      x IS NULL, 
      NULL, 
      COUNT(*) OVER (PARTITION BY (
        -- Partition by the number of earlier NULLs
        SELECT COUNT(*) FROM t AS t1 
        WHERE t1.id < t.id AND t1.x IS NULL
        ), 
        -- Exclude the end boundary "NULL" from the window
        x IS NULL
        ORDER BY id
      )
    ) AS c 
FROM t;

How does it work?

First, let's see what that subquery is all about:

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p 
FROM t;
+----+------+---+
| id | x    | p |
+----+------+---+
|  1 |    1 | 0 |
|  2 |    1 | 0 |
|  3 | NULL | 0 |
|  4 | NULL | 1 |
|  5 |    1 | 2 |
|  6 | NULL | 2 |
|  7 |    1 | 3 |
|  9 |    1 | 3 |
| 10 |    1 | 3 |
+----+------+---+

By counting the number of "NULL rows" appearing earlier in the table, we get a value we can use to find the starting point of each window.

That alone goes one row too far, though - the "NULL row" which should end each window gets included in the window. However, you can use multiple expressions to partition windows.

SELECT id, x, 
(SELECT COUNT(*) FROM t AS t1 WHERE t1.id < t.id AND t1.x IS NULL) AS p1, 
x IS NULL AS p2 
FROM t;
+----+------+------+----+
| id | x    | p1   | p2 |
+----+------+------+----+
|  1 |    1 |    0 |  0 |
|  2 |    1 |    0 |  0 |
|  3 | NULL |    0 |  1 |
|  4 | NULL |    1 |  1 |
|  5 |    1 |    2 |  0 |
|  6 | NULL |    2 |  1 |
|  7 |    1 |    3 |  0 |
|  9 |    1 |    3 |  0 |
| 10 |    1 |    3 |  0 |
+----+------+------+----+

The combination of (p1, p2) neatly partition the rows, so each "NULL row" is by itself, and non-NULL rows are together.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress