A Little Noise

November 11, 2009

Choosing index prefix length

Filed under: MySQL — snoyes @ 9:12 am

It can be handy to index just part of a long character column. The strings might vary enough in just the first few characters to get the same cardinality out of a partial index as a full index would give, and the partial index takes up much less disk space and memory.

Indexes in The Manual

But how long should that prefix be? Should it be the same cardinality as the original data? Within 10%?

I dunno; somebody else can blog about that decision. I’m just interested in a query that finds the number which satisfies your requirements.

I have a table called ‘numbers’ with an int field ‘num’ that just holds values 1 through a large number; for this case, it needs to be no larger than the length of the longest string in the table.

Then:

SELECT
  numbers.num AS prefixLength,
  dt.originalCardinality,
  (SELECT COUNT(DISTINCT LEFT(stringField, numbers.num)) FROM yourTable) AS prefixCardinality
FROM
  numbers
  JOIN (SELECT COUNT(DISTINCT stringField) AS originalCardinality FROM yourTable) AS dt
WHERE
  (SELECT COUNT(DISTINCT LEFT(stringField, numbers.num)) FROM yourTable) >= .90 * originalCardinality
ORDER BY numbers.num
LIMIT 1;

1 Comment »

  1. For people last names, it tends to be 3. And that actually is a good baseline, if you don’t want to spend a lot of time on it.

    Comment by Arjen Lentz — November 11, 2009 @ 7:57 pm

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress