A Little Noise

21Aug/066

Top N of a Group

No doubt everybody has solved it before, but ten seconds on Google didn't show any obvious results, so here we go: How to pick the top 5 records of each group. And we'll do it without subqueries, for the pre-4.1 folks.

We'll define "top 5" as "the first 5 records when ordering by a field called `rating`".

SELECT @oldGroup := 0, @count := 0;

SELECT 
    @oldGroup := groupId,
    theTable.*
FROM theTable 
HAVING (@count := IF(groupId <=> @oldGroup, @count+1, 0)) < 5
ORDER BY groupId, rating;

UPDATE: At last, the version with a subquery and no user variables. Someday, when the 'LIMIT & IN/ALL/ANY/SOME subquery' restriction is lifted, I'll give that correlated subquery that Sheeri mentioned a shot.

SELECT 
    theTable.* 
FROM 
    theTable 
    JOIN (
        SELECT 
            t1.groupId, 
            t1.rating, 
            COUNT(t2.groupId) AS theCount 
        FROM theTable t1 
        LEFT JOIN theTable t2 ON t1.groupId = t2.groupId AND t1.rating > t2.rating 
        GROUP BY t1.groupId, t1.rating 
        HAVING theCount < 5
    ) AS dt USING (groupId, rating);
Filed under: MySQL FAQ Leave a comment
Comments (6) Trackbacks (1)
  1. This is one of the numerous examples showing why MySQL is vastly inferior compared to real RDBMS … you can solve this on Oracle, for example, with a simple subquery.

  2. Nah — you did not read the post — he said specifically “And we’ll do it without subqueries, for the pre-4.1 folks.” It can be done the same way in Oracle as it can in MySQL — with a correlated subquery that is ANSI SQL Standard.

  3. Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:

    DROP DATABASE IF EXISTS topn;
    CREATE DATABASE topn;

    USE topn;

    CREATE TABLE theTable (
    pKey int PRIMARY KEY
    , groupId int
    , rating int
    );

    INSERT INTO theTable (
    pKey
    , groupId
    , rating
    ) VALUES
    ( 1, 1, 55 )
    , ( 2, 1, 53 )
    , ( 3, 1, 51 )
    , ( 4, 1, 59 )
    , ( 5, 1, 58 )
    , ( 6, 1, 58 )
    , ( 7, 1, 53 )
    , ( 8, 1, 55 )
    , ( 9, 1, 55 )
    , ( 10, 1, 53 )
    , ( 11, 2, 52 )
    , ( 12, 2, 53 )
    , ( 13, 2, 54 )
    , ( 14, 2, 59 )
    , ( 15, 2, 58 )
    , ( 16, 2, 57 )
    , ( 17, 2, 56 )
    ;

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey)

  4. Looks like the comment was cut off. Just the query:

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey)

  5. Sorry. You have a problem with inserting greater than and less than characters in this blog. That’s what cut off the query.

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey) >= (t2.rating, t2.pKey)
    AND t1.groupId = t2.groupId
    GROUP BY t1.pKey
    , t1.rating
    , t1.groupId
    HAVING cnt >= 5
    ORDER BY t1.groupId, cnt
    ;

  6. Sorry. Please remove all the above mess. This replaces it.

    Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:

    DROP DATABASE IF EXISTS topn;
    CREATE DATABASE topn;

    USE topn;

    CREATE TABLE theTable (
    pKey int PRIMARY KEY
    , groupId int
    , rating int
    );

    INSERT INTO theTable (
    pKey
    , groupId
    , rating
    ) VALUES
    ( 1, 1, 55 )
    , ( 2, 1, 53 )
    , ( 3, 1, 51 )
    , ( 4, 1, 59 )
    , ( 5, 1, 58 )
    , ( 6, 1, 58 )
    , ( 7, 1, 53 )
    , ( 8, 1, 55 )
    , ( 9, 1, 55 )
    , ( 10, 1, 53 )
    , ( 11, 2, 52 )
    , ( 12, 2, 53 )
    , ( 13, 2, 54 )
    , ( 14, 2, 59 )
    , ( 15, 2, 58 )
    , ( 16, 2, 57 )
    , ( 17, 2, 56 )
    ;

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey) <= (t2.rating, t2.pKey)
    AND t1.groupId = t2.groupId
    GROUP BY t1.pKey
    , t1.rating
    , t1.groupId
    HAVING cnt <= 5
    ORDER BY t1.groupId, cnt
    ;


Leave a comment