Occasionally in #mysql the question pops up, “How can I get all the values from a table that satisfy all of multiple conditions?”
Then we beat the questioner with a large stick, until they give an example of what they really mean.
SELECT * FROM quizAnswers; +-------------+----------+ | studentName | question | +-------------+----------+ | seekwill | A | | seekwill | B | | seekwill | C | | roxlu | A | | fury | B | | fury | B | +-------------+----------+
Find all the students who have answered both questions ‘A’ and ‘B’.
First shot at it is to group them together, and count their answers:
SELECT studentName FROM quizAnswers GROUP BY studentName HAVING COUNT(*) = 2;
Sometimes that works, but it won’t in this case: fury has 2 answers, but to the same question. seekwill has 3 answers. We’ll get the wrong result.
UPDATE: As Arlen notes, this query works just fine if we change COUNT(*) to COUNT(DISTINCT question). Now back to the regularly scheduled exercise in obfuscation.
Next we turn to subqueries:
SELECT DISTINCT studentName FROM quizAnswers WHERE studentName IN ( SELECT studentName FROM quizAnswers WHERE question = 'A' ) AND studentNAme IN ( SELECT studentName FROM quizAnswers WHERE question = 'B' ); +-------------+ | studentName | +-------------+ | seekwill | +-------------+
Gets the right answer. But it doesn’t work in 4.0 or earlier (and people on Freenode always seem to use a shared host running 3.23). So we’ll rewrite it to use JOIN instead.
SELECT DISTINCT studentName FROM quizAnswers AS a JOIN quizAnswers AS b USING (studentName) WHERE a.question = 'A' AND b.question = 'B'; +-------------+ | studentName | +-------------+ | seekwill | +-------------+
“But wait!” they say. “This example only had 2 values. My real data has 22 possible values. I don’t want to self-join the table 22 times (or write 22 subqueries). And the real values are random strings, not nice and sequential, so don’t try building the query in a loop or anything!”
Never fear! Obscure and infrequently used string and aggregate functions to the rescue!
SELECT studentName FROM quizAnswers GROUP BY studentName HAVING BIT_OR(1 << FIELD(question, 'A', 'B') - 1) = 3; +-------------+ | studentName | +-------------+ | seekwill | +-------------+
Handling the rest of those 22 values is a simple matter of adding them to the parameters to FIELD(), and changing that 3 to (2^number_of_values - 1). We'll even put the comma-separated values into a user variable and let MySQL figure out the rest (note the switch to FIND_IN_SET here, since now we're using a single string of values):
SET @q = 'A,B'; SELECT studentName FROM quizAnswers GROUP BY studentName HAVING BIT_OR(1 << FIND_IN_SET(question, @q) - 1) = (1 << LENGTH(@q) - LENGTH(REPLACE(@q, ',', '')) + 1) - 1; -- This is 2^numValues - 1 +-------------+ | studentName | +-------------+ | seekwill | +-------------+
How does it work? I'll let you look up the functions on your own, but here are the values each generates:
SELECT studentName, question, FIND_IN_SET(question, 'A,B,C') AS position, 1 << FIND_IN_SET(question, 'A,B,C') - 1 AS bitmap FROM quizAnswers; +-------------+----------+----------+--------+ | studentName | question | position | bitmap | +-------------+----------+----------+--------+ | seekwill | A | 1 | 1 | | seekwill | B | 2 | 2 | | seekwill | C | 3 | 4 | | roxlu | A | 1 | 1 | | fury | B | 2 | 2 | | fury | B | 2 | 2 | +-------------+----------+----------+--------+
SELECT studentName, BIT_OR(1 << FIND_IN_SET(question, 'A,B,C') - 1) AS groupBitMap FROM quizAnswers GROUP BY studentName; +-------------+-------------+ | studentName | groupBitMap | +-------------+-------------+ | fury | 2 | | roxlu | 1 | | seekwill | 7 | +-------------+-------------+
Hi Scott. Liked the article – would like to feature it in the Dec issue of the Pipelines Newsletter. Here is our November issue:
http://www.quest-pipelines.com/newsletter-v8/newsletter_1107.htm
Look forward to hearing back from you.
Comment by Robin — November 15, 2007 @ 9:49 am