From Freenode: how do you generate a five number summary in MySQL? There is no “median” aggregate function built in. You could do some clever things involving self joins or temporary tables, or build an aggregate UDF – see the comments section in the manual for those approaches.
Here’s another way using a single query. Be sure to set group_concat_max_len high enough for your data, and since it relies on string manipulation, it’s probably not a good choice if your data is millions of rows.
First, a helper function to get the Nth element of a comma-delimited string, just to make the query shorter:
CREATE FUNCTION LIST_ELEM(inString text, pos int) RETURNS TEXT DETERMINISTIC RETURN SUBSTRING_INDEX(SUBSTRING_INDEX(inString, ',', pos), ',', -1);
Now, fetching the min, max, median, and first and third quartiles (computing method 2) for each group:
SELECT
groupId,
GROUP_CONCAT(data ORDER BY data) AS dataSet,
MIN(data) AS min,
(
LIST_ELEM(GROUP_CONCAT(data ORDER BY data), CEIL(COUNT(*)/4))
+ LIST_ELEM(GROUP_CONCAT(data ORDER BY data), FLOOR(COUNT(*)/4 + 1))
) / 2 AS q1,
(
LIST_ELEM(GROUP_CONCAT(data ORDER BY data), CEIL(COUNT(*)/2))
+ LIST_ELEM(GROUP_CONCAT(data ORDER BY data), FLOOR(COUNT(*)/2 + 1))
) / 2 AS median,
(
LIST_ELEM(GROUP_CONCAT(data ORDER BY data DESC), CEIL(COUNT(*)/4))
+ LIST_ELEM(GROUP_CONCAT(data ORDER BY data DESC), FLOOR(COUNT(*)/4 + 1))
) / 2 AS q3,
MAX(data) AS max
FROM t
GROUP BY groupId;
+---------+---------------------+------+------+--------+------+------+ | groupId | dataSet | min | q1 | median | q3 | max | +---------+---------------------+------+------+--------+------+------+ | 1 | 0,0,1,2,13,27,61,63 | 0 | 0.5 | 7.5 | 44 | 63 | | 2 | 0,0,1,2,25 | 0 | 0 | 1 | 2 | 25 | +---------+---------------------+------+------+--------+------+------+

Leave a Reply