A non-rigorous, non-scientific, totally off-the-cuff test of which function to pick when you need to group by year and month.
I populated a table with 262K rows of random dates, and then ran
SELECT %s, COUNT(*) FROM table GROUP BY %s ORDER BY NULL
with various functions, which should all result in the same grouping. I repeated each query five times and show the average time, using three different column types DATE, DATETIME, and TIMESTAMP.
expression | DATE | DATETIME | TIMESTAMP |
---|---|---|---|
EXTRACT(YEAR_MONTH FROM d) | 0.362 | 0.369 | 0.581 |
LAST_DAY(d) | 0.374 | 0.389 | 0.582 |
DATE_SUB(d, INTERVAL DAY(d) DAY) | 0.429 | 0.882 | 1.452 |
d – INTERVAL DAY(d) DAY | 0.429 | 0.887 | 1.535 |
SUBSTRING(d, 1, 7) | 0.454 | 0.488 | 0.719 |
YEAR(d), MONTH(d) | 1.046 | 1.126 | 2.045 |
MONTH(d), YEAR(d) | 1.116 | 1.196 | 2.112 |
LEFT(d, 7) | 1.307 | 1.405 | 2.123 |
DATE_FORMAT(d, ‘%Y%m’) | 1.480 | 1.565 | 2.112 |
DATE_FORMAT(d, ‘%Y-%m’) | 1.514 | 1.615 | 2.564 |
DATE_FORMAT(d, ‘%m%Y’) | 1.517 | 1.604 | 2.420 |
DATE_FORMAT(d, ‘%m-%Y’) | 1.562 | 1.656 | 2.465 |
MONTHNAME(d), YEAR(d) | 1.613 | 1.713 | 2.812 |
YEAR(d), MONTHNAME(d) | 1.663 | 1.766 | 2.873 |
And just in case you want to extract a full date (which really only makes sense for datetime and timestamp):
expression | DATE | DATETIME | TIMESTAMP |
---|---|---|---|
DATE(d) | 0.357 | 0.374 | 0.591 |
EXTRACT(YEAR_MONTH FROM d), EXTRACT(DAY FROM d) | 0.377 | 0.407 | 0.730 |
EXTRACT(DAY FROM d), EXTRACT(YEAR_MONTH FROM d) | 0.395 | 0.422 | 0.751 |
MONTH(d), YEAR(d), DAY(d) | 0.395 | 0.426 | 0.870 |
YEAR(d), DAY(d), MONTH(d) | 0.398 | 0.440 | 0.862 |
YEAR(d), MONTH(d), DAY(d) | 0.406 | 0.441 | 0.867 |
DAY(d), YEAR(d), MONTH(d) | 0.409 | 0.444 | 0.859 |
LEFT(d, 10) | 0.437 | 0.487 | 0.728 |
SUBSTRING_INDEX(d, ‘ ‘, 1) | 0.439 | 0.475 | 0.743 |
DAY(d), MONTH(d), YEAR(d) | 0.441 | 0.477 | 0.901 |
MONTH(d), DAY(d), YEAR(d) | 0.442 | 0.472 | 0.914 |
SUBSTRING(d, 1, 10) | 0.460 | 0.496 | 0.729 |
DATE_FORMAT(d, ‘%d%Y%m’) | 0.539 | 0.571 | 0.852 |
DATE_FORMAT(d, ‘%m%Y%d’) | 0.542 | 0.570 | 0.841 |
DATE_FORMAT(d, ‘%m%d%Y’) | 0.543 | 0.572 | 0.846 |
DATE_FORMAT(d, ‘%Y%m%d’) | 0.544 | 0.570 | 0.842 |
DATE_FORMAT(d, ‘%Y%d%m’) | 0.544 | 0.572 | 0.843 |
DATE_FORMAT(d, ‘%d-%Y-%d’) | 0.547 | 0.574 | 0.848 |
DATE_FORMAT(d, ‘%d%m%Y’) | 0.549 | 0.573 | 0.842 |
DATE_FORMAT(d, ‘%m-%Y-%d’) | 0.552 | 0.583 | 0.869 |
DATE_FORMAT(d, ‘%Y-%m-%d’) | 0.558 | 0.583 | 0.854 |
DATE_FORMAT(d, ‘%Y-%d-%m’) | 0.569 | 0.593 | 0.867 |
d – INTERVAL HOUR(d) HOUR – INTERVAL MINUTE(d) MINUTE – INTERVAL SECOND(d) SECOND | 0.573 | 0.653 | 1.249 |
DATE_FORMAT(d, ‘%d-%m-%Y’) | 0.598 | 0.637 | 0.920 |
DATE_FORMAT(d, ‘%m-%d-%Y’) | 0.601 | 0.633 | 0.904 |
Out of interest how does ‘substring(d,1,7)’ compare on your test data?
Thats what I tend to use.
Comment by Brry Hunter — November 24, 2009 @ 9:19 am
I would be interested to see the same test repeated grouping by year-month when storing times with timestamp and datetime – since this seems to be the common field users will store dates in.
You could also try using ORDER BY NULL to remove any time that might be spent sorting. This might produce greater differences between the results.
Comment by Morgan Tocker — November 24, 2009 @ 9:28 am
Updated the post with those suggestions. Note that the SUBSTRING() approach is not so hot for the DATE field, but much better than most others for DATETIME and TIMESTAMP.
Comment by snoyes — November 24, 2009 @ 10:54 am
Thanks for the update!
Comment by Morgan Tocker — November 24, 2009 @ 1:07 pm
(ack – your wordpress cut my comment off at the first less than sign.. reposting)
I primarily group by date on my datetime/timestamp fields. I compared some times of my own and found the speeds:
DATE(created) quicker than substring(created,1,10)
substring(created,1,10) is the same as TO_DAYS(created)
DATE_FORMAT(created, ‘%Y-%m-%d’) is slower than substring(created,1,10)
I was already using DATE(created) so I don’t get a performance boost today. Thanks for the idea though.
The next time I have to create a table for data that is primarily going to be DATE aggregated, I am creating separate date and time fields and only indexing the date.
Comment by Marques Johansson — November 25, 2009 @ 5:12 am
Marques: Grouping by a simple indexed column will almost certainly be faster than grouping by any function call.
You might consider storing a datetime and a date. It repeats data, which probably violates some academic normal form, but then you can group by the date and still do math against the datetime without having to concatenate the date and the time. It’s not very often that I find a need for just the time part.
Comment by snoyes — November 25, 2009 @ 9:40 am
here is a good website for helping you format dates using the mysql date_format function.
Comment by mysql format date — January 8, 2010 @ 3:02 pm