A Little Noise

August 21, 2006

Top N of a Group

Filed under: MySQL FAQ — snoyes @ 7:57 pm

No doubt everybody has solved it before, but ten seconds on Google didn’t show any obvious results, so here we go: How to pick the top 5 records of each group.
We’ll define “top 5” as “the first 5 records when ordering by a field called `rating`”.

For MySQL 8.0 and later, use the RANK() window function inside either a common table expression or a derived table:

WITH cte AS (
SELECT *, RANK() OVER(PARTITION BY groupId ORDER BY rating) AS rank
FROM theTable
)
SELECT * FROM cte WHERE rank <= 5;

5.7 and earlier didn’t have CTEs. Subqueries will suffice.

SELECT 
    theTable.* 
FROM 
    theTable 
    JOIN (
        SELECT 
            t1.groupId, 
            t1.rating, 
            COUNT(t2.groupId) AS theCount 
        FROM theTable t1 
        LEFT JOIN theTable t2 ON t1.groupId = t2.groupId AND t1.rating > t2.rating 
        GROUP BY t1.groupId, t1.rating 
        HAVING theCount < 5
    ) AS dt USING (groupId, rating);

Pre-4.1, you couldn’t even use subqueries.

SELECT @oldGroup := 0, @count := 0;

SELECT 
    @oldGroup := groupId,
    theTable.*
FROM theTable 
HAVING (@count := IF(groupId <=> @oldGroup, @count+1, 0)) < 5
ORDER BY groupId, rating;

10 Comments »

  1. This is one of the numerous examples showing why MySQL is vastly inferior compared to real RDBMS … you can solve this on Oracle, for example, with a simple subquery.

    Comment by nah — August 22, 2006 @ 5:09 pm

  2. Nah — you did not read the post — he said specifically “And we’ll do it without subqueries, for the pre-4.1 folks.” It can be done the same way in Oracle as it can in MySQL — with a correlated subquery that is ANSI SQL Standard.

    Comment by Sheeri — August 23, 2006 @ 3:18 am

  3. Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:

    DROP DATABASE IF EXISTS topn;
    CREATE DATABASE topn;

    USE topn;

    CREATE TABLE theTable (
    pKey int PRIMARY KEY
    , groupId int
    , rating int
    );

    INSERT INTO theTable (
    pKey
    , groupId
    , rating
    ) VALUES
    ( 1, 1, 55 )
    , ( 2, 1, 53 )
    , ( 3, 1, 51 )
    , ( 4, 1, 59 )
    , ( 5, 1, 58 )
    , ( 6, 1, 58 )
    , ( 7, 1, 53 )
    , ( 8, 1, 55 )
    , ( 9, 1, 55 )
    , ( 10, 1, 53 )
    , ( 11, 2, 52 )
    , ( 12, 2, 53 )
    , ( 13, 2, 54 )
    , ( 14, 2, 59 )
    , ( 15, 2, 58 )
    , ( 16, 2, 57 )
    , ( 17, 2, 56 )
    ;

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey)

    Comment by Jon Armstrong (Xgc) — February 12, 2009 @ 12:56 pm

  4. Looks like the comment was cut off. Just the query:

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey)

    Comment by Jon Armstrong (Xgc) — February 12, 2009 @ 12:57 pm

  5. Sorry. You have a problem with inserting greater than and less than characters in this blog. That’s what cut off the query.

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey) >= (t2.rating, t2.pKey)
    AND t1.groupId = t2.groupId
    GROUP BY t1.pKey
    , t1.rating
    , t1.groupId
    HAVING cnt >= 5
    ORDER BY t1.groupId, cnt
    ;

    Comment by Jon Armstrong (Xgc) — February 12, 2009 @ 12:59 pm

  6. Sorry. Please remove all the above mess. This replaces it.

    Here’s another form of the solution without subquery that also resolves matching ratings by using the primary key:

    DROP DATABASE IF EXISTS topn;
    CREATE DATABASE topn;

    USE topn;

    CREATE TABLE theTable (
    pKey int PRIMARY KEY
    , groupId int
    , rating int
    );

    INSERT INTO theTable (
    pKey
    , groupId
    , rating
    ) VALUES
    ( 1, 1, 55 )
    , ( 2, 1, 53 )
    , ( 3, 1, 51 )
    , ( 4, 1, 59 )
    , ( 5, 1, 58 )
    , ( 6, 1, 58 )
    , ( 7, 1, 53 )
    , ( 8, 1, 55 )
    , ( 9, 1, 55 )
    , ( 10, 1, 53 )
    , ( 11, 2, 52 )
    , ( 12, 2, 53 )
    , ( 13, 2, 54 )
    , ( 14, 2, 59 )
    , ( 15, 2, 58 )
    , ( 16, 2, 57 )
    , ( 17, 2, 56 )
    ;

    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , COUNT(t2.pKey) AS cnt
    FROM theTable AS t1
    LEFT JOIN theTable AS t2
    ON (t1.rating, t1.pKey) <= (t2.rating, t2.pKey)
    AND t1.groupId = t2.groupId
    GROUP BY t1.pKey
    , t1.rating
    , t1.groupId
    HAVING cnt <= 5
    ORDER BY t1.groupId, cnt
    ;

    Comment by Jon Armstrong (Xgc) — February 12, 2009 @ 1:05 pm

  7. […] subquery’ restriction. But there’s always some way around. Thx to Jon Armstrong from A little Noise for the code […]

    Pingback by ActiveSPLIT.com » MySQL – Get the Top N of a Group — March 26, 2010 @ 4:43 am

  8. thank you,thank you,thank you,thank you….

    Comment by John — September 28, 2014 @ 3:39 pm

  9. See also:

    ROW_NUMBER() OVER (PARTITION BY groupId ORDER BY rating DESC, pKey) AS n

    Place that in a CTE term or derived table. Now select all results with n <= 5.

    Easily done. Note: To use Window Functions and Common Table Expressions, MySQL 8.0 or at least MariaDB 10.2+ or 10.3 is required.

    Comment by Jon Armstrong (Xgc) — April 25, 2018 @ 9:10 pm

  10. Using the prior topn schema / data (hopefully, the logic / symbols will survive the paste, even if the format and spacing does not):

    — Top 5 Items (by rating) per Group – Using Window Functions and Common Table Expressions
    WITH ranks AS (
    SELECT t1.pKey
    , t1.rating
    , t1.groupId
    , ROW_NUMBER() OVER (PARTITION BY groupId ORDER BY rating DESC, pKey) AS n
    FROM theTable AS t1
    )
    SELECT *
    FROM ranks
    WHERE n <= 5
    ORDER BY groupId, n
    ;

    Comment by Jon Armstrong (Xgc) — April 25, 2018 @ 9:36 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress