21Aug/066
Top N of a Group
No doubt everybody has solved it before, but ten seconds on Google didn't show any obvious results, so here we go: How to pick the top 5 records of each group. And we'll do it without subqueries, for the pre-4.1 folks.
We'll define "top 5" as "the first 5 records when ordering by a field called `rating`".
SELECT @oldGroup := 0, @count := 0;
SELECT
@oldGroup := groupId,
theTable.*
FROM theTable
HAVING (@count := IF(groupId <=> @oldGroup, @count+1, 0)) < 5
ORDER BY groupId, rating;
UPDATE: At last, the version with a subquery and no user variables. Someday, when the 'LIMIT & IN/ALL/ANY/SOME subquery' restriction is lifted, I'll give that correlated subquery that Sheeri mentioned a shot.
SELECT
theTable.*
FROM
theTable
JOIN (
SELECT
t1.groupId,
t1.rating,
COUNT(t2.groupId) AS theCount
FROM theTable t1
LEFT JOIN theTable t2 ON t1.groupId = t2.groupId AND t1.rating > t2.rating
GROUP BY t1.groupId, t1.rating
HAVING theCount < 5
) AS dt USING (groupId, rating);
August 22nd, 2006 - 17:09
This is one of the numerous examples showing why MySQL is vastly inferior compared to real RDBMS … you can solve this on Oracle, for example, with a simple subquery.
August 23rd, 2006 - 03:18
Nah — you did not read the post — he said specifically “And we’ll do it without subqueries, for the pre-4.1 folks.” It can be done the same way in Oracle as it can in MySQL — with a correlated subquery that is ANSI SQL Standard.
February 12th, 2009 - 12:56
Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:
DROP DATABASE IF EXISTS topn;
CREATE DATABASE topn;
USE topn;
CREATE TABLE theTable (
pKey int PRIMARY KEY
, groupId int
, rating int
);
INSERT INTO theTable (
pKey
, groupId
, rating
) VALUES
( 1, 1, 55 )
, ( 2, 1, 53 )
, ( 3, 1, 51 )
, ( 4, 1, 59 )
, ( 5, 1, 58 )
, ( 6, 1, 58 )
, ( 7, 1, 53 )
, ( 8, 1, 55 )
, ( 9, 1, 55 )
, ( 10, 1, 53 )
, ( 11, 2, 52 )
, ( 12, 2, 53 )
, ( 13, 2, 54 )
, ( 14, 2, 59 )
, ( 15, 2, 58 )
, ( 16, 2, 57 )
, ( 17, 2, 56 )
;
SELECT t1.pKey
, t1.rating
, t1.groupId
, COUNT(t2.pKey) AS cnt
FROM theTable AS t1
LEFT JOIN theTable AS t2
ON (t1.rating, t1.pKey)
February 12th, 2009 - 12:57
Looks like the comment was cut off. Just the query:
SELECT t1.pKey
, t1.rating
, t1.groupId
, COUNT(t2.pKey) AS cnt
FROM theTable AS t1
LEFT JOIN theTable AS t2
ON (t1.rating, t1.pKey)
February 12th, 2009 - 12:59
Sorry. You have a problem with inserting greater than and less than characters in this blog. That’s what cut off the query.
SELECT t1.pKey
, t1.rating
, t1.groupId
, COUNT(t2.pKey) AS cnt
FROM theTable AS t1
LEFT JOIN theTable AS t2
ON (t1.rating, t1.pKey) >= (t2.rating, t2.pKey)
AND t1.groupId = t2.groupId
GROUP BY t1.pKey
, t1.rating
, t1.groupId
HAVING cnt >= 5
ORDER BY t1.groupId, cnt
;
February 12th, 2009 - 13:05
Sorry. Please remove all the above mess. This replaces it.
Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:
DROP DATABASE IF EXISTS topn;
CREATE DATABASE topn;
USE topn;
CREATE TABLE theTable (
pKey int PRIMARY KEY
, groupId int
, rating int
);
INSERT INTO theTable (
pKey
, groupId
, rating
) VALUES
( 1, 1, 55 )
, ( 2, 1, 53 )
, ( 3, 1, 51 )
, ( 4, 1, 59 )
, ( 5, 1, 58 )
, ( 6, 1, 58 )
, ( 7, 1, 53 )
, ( 8, 1, 55 )
, ( 9, 1, 55 )
, ( 10, 1, 53 )
, ( 11, 2, 52 )
, ( 12, 2, 53 )
, ( 13, 2, 54 )
, ( 14, 2, 59 )
, ( 15, 2, 58 )
, ( 16, 2, 57 )
, ( 17, 2, 56 )
;
SELECT t1.pKey
, t1.rating
, t1.groupId
, COUNT(t2.pKey) AS cnt
FROM theTable AS t1
LEFT JOIN theTable AS t2
ON (t1.rating, t1.pKey) <= (t2.rating, t2.pKey)
AND t1.groupId = t2.groupId
GROUP BY t1.pKey
, t1.rating
, t1.groupId
HAVING cnt <= 5
ORDER BY t1.groupId, cnt
;