Top N of a Group

No doubt everybody has solved it before, but ten seconds on Google didn’t show any obvious results, so here we go: How to pick the top 5 records of each group.
We’ll define “top 5” as “the first 5 records when ordering by a field called `rating`”.

For MySQL 8.0 and later, use the RANK() window function inside either a common table expression or a derived table:

WITH cte AS (
SELECT *, RANK() OVER(PARTITION BY groupId ORDER BY rating) AS rank
FROM theTable
)
SELECT * FROM cte WHERE rank <= 5;

5.7 and earlier didn’t have CTEs. Subqueries will suffice.

SELECT 
    theTable.* 
FROM 
    theTable 
    JOIN (
        SELECT 
            t1.groupId, 
            t1.rating, 
            COUNT(t2.groupId) AS theCount 
        FROM theTable t1 
        LEFT JOIN theTable t2 ON t1.groupId = t2.groupId AND t1.rating > t2.rating 
        GROUP BY t1.groupId, t1.rating 
        HAVING theCount < 5
    ) AS dt USING (groupId, rating);

Pre-4.1, you couldn’t even use subqueries.

SELECT @oldGroup := 0, @count := 0;

SELECT 
    @oldGroup := groupId,
    theTable.*
FROM theTable 
HAVING (@count := IF(groupId <=> @oldGroup, @count+1, 0)) < 5
ORDER BY groupId, rating;

Comments

10 responses to “Top N of a Group”

  1. nah Avatar
    nah

    This is one of the numerous examples showing why MySQL is vastly inferior compared to real RDBMS … you can solve this on Oracle, for example, with a simple subquery.

  2. Sheeri Avatar

    Nah — you did not read the post — he said specifically “And we’ll do it without subqueries, for the pre-4.1 folks.” It can be done the same way in Oracle as it can in MySQL — with a correlated subquery that is ANSI SQL Standard.

  3. Jon Armstrong (Xgc) Avatar
    Jon Armstrong (Xgc)

    Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:

    DROP DATABASE IF EXISTS topn;
    CREATE DATABASE topn;

    USE topn;

    CREATE TABLE theTable (
    pKey int PRIMARY KEY
    , groupId int
    , rating int
    );

    INSERT INTO theTable (
    pKey
    , groupId
    , rating
    ) VALUES
    ( 1, 1, 55 )
    , ( 2, 1, 53 )
    , ( 3, 1, 51 )
    , ( 4, 1, 59 )
    , ( 5, 1, 58 )
    , ( 6, 1, 58 )
    , ( 7, 1, 53 )
    , ( 8, 1, 55 )
    , ( 9, 1, 55 )
    , ( 10, 1, 53 )
    , ( 11, 2, 52 )
    , ( 12, 2, 53 )
    , ( 13, 2, 54 )
    , ( 14, 2, 59 )
    , ( 15, 2, 58 )
    , ( 16, 2, 57 )
    , ( 17, 2, 56 )
    ;

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey)

  4. Jon Armstrong (Xgc) Avatar
    Jon Armstrong (Xgc)

    Looks like the comment was cut off. Just the query:

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey)

  5. Jon Armstrong (Xgc) Avatar
    Jon Armstrong (Xgc)

    Sorry. You have a problem with inserting greater than and less than characters in this blog. That’s what cut off the query.

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey) >= (t2.rating, t2.pKey)
    AND t1.groupId = t2.groupId
    GROUP BY t1.pKey
    , t1.rating
    , t1.groupId
    HAVING cnt >= 5
    ORDER BY t1.groupId, cnt
    ;

  6. Jon Armstrong (Xgc) Avatar
    Jon Armstrong (Xgc)

    Sorry. Please remove all the above mess. This replaces it.

    Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:

    DROP DATABASE IF EXISTS topn;
    CREATE DATABASE topn;

    USE topn;

    CREATE TABLE theTable (
    pKey int PRIMARY KEY
    , groupId int
    , rating int
    );

    INSERT INTO theTable (
    pKey
    , groupId
    , rating
    ) VALUES
    ( 1, 1, 55 )
    , ( 2, 1, 53 )
    , ( 3, 1, 51 )
    , ( 4, 1, 59 )
    , ( 5, 1, 58 )
    , ( 6, 1, 58 )
    , ( 7, 1, 53 )
    , ( 8, 1, 55 )
    , ( 9, 1, 55 )
    , ( 10, 1, 53 )
    , ( 11, 2, 52 )
    , ( 12, 2, 53 )
    , ( 13, 2, 54 )
    , ( 14, 2, 59 )
    , ( 15, 2, 58 )
    , ( 16, 2, 57 )
    , ( 17, 2, 56 )
    ;

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey) <= (t2.rating, t2.pKey)
    AND t1.groupId = t2.groupId
    GROUP BY t1.pKey
    , t1.rating
    , t1.groupId
    HAVING cnt <= 5
    ORDER BY t1.groupId, cnt
    ;

  7. […] subquery’ restriction. But there’s always some way around. Thx to Jon Armstrong from A little Noise for the code […]

  8. John Avatar
    John

    thank you,thank you,thank you,thank you….

  9. Jon Armstrong (Xgc) Avatar
    Jon Armstrong (Xgc)

    See also:

    ROW_NUMBER() OVER (PARTITION BY groupId ORDER BY rating DESC, pKey) AS n

    Place that in a CTE term or derived table. Now select all results with n <= 5.

    Easily done. Note: To use Window Functions and Common Table Expressions, MySQL 8.0 or at least MariaDB 10.2+ or 10.3 is required.

  10. Jon Armstrong (Xgc) Avatar
    Jon Armstrong (Xgc)

    Using the prior topn schema / data (hopefully, the logic / symbols will survive the paste, even if the format and spacing does not):

    — Top 5 Items (by rating) per Group – Using Window Functions and Common Table Expressions
    WITH ranks AS (
    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , ROW_NUMBER() OVER (PARTITION BY groupId ORDER BY rating DESC, pKey) AS n
    FROM theTable AS t1
    )
    SELECT *
    FROM ranks
    WHERE n <= 5
    ORDER BY groupId, n
    ;

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.