A Little Noise

April 25, 2013

Five Number Summary

Filed under: MySQL — snoyes @ 12:10 pm

From Freenode: how do you generate a five number summary in MySQL? There is no “median” aggregate function built in. You could do some clever things involving self joins or temporary tables, or build an aggregate UDF – see the comments section in the manual for those approaches.

Here’s another way using a single query. Be sure to set group_concat_max_len high enough for your data, and since it relies on string manipulation, it’s probably not a good choice if your data is millions of rows.

First, a helper function to get the Nth element of a comma-delimited string, just to make the query shorter:

CREATE FUNCTION LIST_ELEM(inString text, pos int) 
RETURNS TEXT DETERMINISTIC 
RETURN SUBSTRING_INDEX(SUBSTRING_INDEX(inString, ',', pos), ',', -1);

Now, fetching the min, max, median, and first and third quartiles (computing method 2) for each group:

SELECT 

  groupId,

  GROUP_CONCAT(data ORDER BY data) AS dataSet,

  MIN(data) AS min,

  (
    LIST_ELEM(GROUP_CONCAT(data ORDER BY data), CEIL(COUNT(*)/4))
    + LIST_ELEM(GROUP_CONCAT(data ORDER BY data), FLOOR(COUNT(*)/4 + 1))
  ) / 2 AS q1,

  (
    LIST_ELEM(GROUP_CONCAT(data ORDER BY data), CEIL(COUNT(*)/2))
    + LIST_ELEM(GROUP_CONCAT(data ORDER BY data), FLOOR(COUNT(*)/2 + 1))
  ) / 2 AS median,

  (
    LIST_ELEM(GROUP_CONCAT(data ORDER BY data DESC), CEIL(COUNT(*)/4))
    + LIST_ELEM(GROUP_CONCAT(data ORDER BY data DESC), FLOOR(COUNT(*)/4 + 1))
  ) / 2 AS q3,

  MAX(data) AS max

FROM t 
GROUP BY groupId;
+---------+---------------------+------+------+--------+------+------+
| groupId | dataSet             | min  | q1   | median | q3   | max  |
+---------+---------------------+------+------+--------+------+------+
|       1 | 0,0,1,2,13,27,61,63 |    0 |  0.5 |    7.5 |   44 |   63 |
|       2 | 0,0,1,2,25          |    0 |    0 |      1 |    2 |   25 |
+---------+---------------------+------+------+--------+------+------+

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress