A Little Noise

November 24, 2009

group date

Filed under: MySQL — snoyes @ 8:49 am

A non-rigorous, non-scientific, totally off-the-cuff test of which function to pick when you need to group by year and month.

I populated a table with 262K rows of random dates, and then ran

SELECT %s, COUNT(*) FROM table GROUP BY %s ORDER BY NULL

with various functions, which should all result in the same grouping. I repeated each query five times and show the average time, using three different column types DATE, DATETIME, and TIMESTAMP.

expression DATE DATETIME TIMESTAMP
EXTRACT(YEAR_MONTH FROM d) 0.362 0.369 0.581
LAST_DAY(d) 0.374 0.389 0.582
DATE_SUB(d, INTERVAL DAY(d) DAY) 0.429 0.882 1.452
d – INTERVAL DAY(d) DAY 0.429 0.887 1.535
SUBSTRING(d, 1, 7) 0.454 0.488 0.719
YEAR(d), MONTH(d) 1.046 1.126 2.045
MONTH(d), YEAR(d) 1.116 1.196 2.112
LEFT(d, 7) 1.307 1.405 2.123
DATE_FORMAT(d, ‘%Y%m’) 1.480 1.565 2.112
DATE_FORMAT(d, ‘%Y-%m’) 1.514 1.615 2.564
DATE_FORMAT(d, ‘%m%Y’) 1.517 1.604 2.420
DATE_FORMAT(d, ‘%m-%Y’) 1.562 1.656 2.465
MONTHNAME(d), YEAR(d) 1.613 1.713 2.812
YEAR(d), MONTHNAME(d) 1.663 1.766 2.873

And just in case you want to extract a full date (which really only makes sense for datetime and timestamp):

expression DATE DATETIME TIMESTAMP
DATE(d) 0.357 0.374 0.591
EXTRACT(YEAR_MONTH FROM d), EXTRACT(DAY FROM d) 0.377 0.407 0.730
EXTRACT(DAY FROM d), EXTRACT(YEAR_MONTH FROM d) 0.395 0.422 0.751
MONTH(d), YEAR(d), DAY(d) 0.395 0.426 0.870
YEAR(d), DAY(d), MONTH(d) 0.398 0.440 0.862
YEAR(d), MONTH(d), DAY(d) 0.406 0.441 0.867
DAY(d), YEAR(d), MONTH(d) 0.409 0.444 0.859
LEFT(d, 10) 0.437 0.487 0.728
SUBSTRING_INDEX(d, ‘ ‘, 1) 0.439 0.475 0.743
DAY(d), MONTH(d), YEAR(d) 0.441 0.477 0.901
MONTH(d), DAY(d), YEAR(d) 0.442 0.472 0.914
SUBSTRING(d, 1, 10) 0.460 0.496 0.729
DATE_FORMAT(d, ‘%d%Y%m’) 0.539 0.571 0.852
DATE_FORMAT(d, ‘%m%Y%d’) 0.542 0.570 0.841
DATE_FORMAT(d, ‘%m%d%Y’) 0.543 0.572 0.846
DATE_FORMAT(d, ‘%Y%m%d’) 0.544 0.570 0.842
DATE_FORMAT(d, ‘%Y%d%m’) 0.544 0.572 0.843
DATE_FORMAT(d, ‘%d-%Y-%d’) 0.547 0.574 0.848
DATE_FORMAT(d, ‘%d%m%Y’) 0.549 0.573 0.842
DATE_FORMAT(d, ‘%m-%Y-%d’) 0.552 0.583 0.869
DATE_FORMAT(d, ‘%Y-%m-%d’) 0.558 0.583 0.854
DATE_FORMAT(d, ‘%Y-%d-%m’) 0.569 0.593 0.867
d – INTERVAL HOUR(d) HOUR – INTERVAL MINUTE(d) MINUTE – INTERVAL SECOND(d) SECOND 0.573 0.653 1.249
DATE_FORMAT(d, ‘%d-%m-%Y’) 0.598 0.637 0.920
DATE_FORMAT(d, ‘%m-%d-%Y’) 0.601 0.633 0.904

7 Comments »

  1. Out of interest how does ‘substring(d,1,7)’ compare on your test data?

    Thats what I tend to use.

    Comment by Brry Hunter — November 24, 2009 @ 9:19 am

  2. I would be interested to see the same test repeated grouping by year-month when storing times with timestamp and datetime – since this seems to be the common field users will store dates in.

    You could also try using ORDER BY NULL to remove any time that might be spent sorting. This might produce greater differences between the results.

    Comment by Morgan Tocker — November 24, 2009 @ 9:28 am

  3. Updated the post with those suggestions. Note that the SUBSTRING() approach is not so hot for the DATE field, but much better than most others for DATETIME and TIMESTAMP.

    Comment by snoyes — November 24, 2009 @ 10:54 am

  4. Thanks for the update!

    Comment by Morgan Tocker — November 24, 2009 @ 1:07 pm

  5. (ack – your wordpress cut my comment off at the first less than sign.. reposting)

    I primarily group by date on my datetime/timestamp fields. I compared some times of my own and found the speeds:

    DATE(created) quicker than substring(created,1,10)
    substring(created,1,10) is the same as TO_DAYS(created)
    DATE_FORMAT(created, ‘%Y-%m-%d’) is slower than substring(created,1,10)

    I was already using DATE(created) so I don’t get a performance boost today. Thanks for the idea though.

    The next time I have to create a table for data that is primarily going to be DATE aggregated, I am creating separate date and time fields and only indexing the date.

    Comment by Marques Johansson — November 25, 2009 @ 5:12 am

  6. Marques: Grouping by a simple indexed column will almost certainly be faster than grouping by any function call.

    You might consider storing a datetime and a date. It repeats data, which probably violates some academic normal form, but then you can group by the date and still do math against the datetime without having to concatenate the date and the time. It’s not very often that I find a need for just the time part.

    Comment by snoyes — November 25, 2009 @ 9:40 am

  7. here is a good website for helping you format dates using the mysql date_format function.

    Comment by mysql format date — January 8, 2010 @ 3:02 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress