Occasionally in #mysql the question pops up, “How can I get all the values from a table that satisfy all of multiple conditions?”
Then we beat the questioner with a large stick, until they give an example of what they really mean.
SELECT * FROM quizAnswers; +-------------+----------+ | studentName | question | +-------------+----------+ | seekwill | A | | seekwill | B | | seekwill | C | | roxlu | A | | fury | B | | fury | B | +-------------+----------+
Find all the students who have answered both questions ‘A’ and ‘B’.
First shot at it is to group them together, and count their answers:
SELECT studentName FROM quizAnswers GROUP BY studentName HAVING COUNT(*) = 2;
Sometimes that works, but it won’t in this case: fury has 2 answers, but to the same question. seekwill has 3 answers. We’ll get the wrong result.
UPDATE: As Arlen notes, this query works just fine if we change COUNT(*) to COUNT(DISTINCT question). Now back to the regularly scheduled exercise in obfuscation.
Next we turn to subqueries:
SELECT DISTINCT studentName
FROM quizAnswers
WHERE
studentName IN (
SELECT studentName FROM quizAnswers WHERE question = 'A'
)
AND studentNAme IN (
SELECT studentName FROM quizAnswers WHERE question = 'B'
);
+-------------+
| studentName |
+-------------+
| seekwill |
+-------------+
Gets the right answer. But it doesn’t work in 4.0 or earlier (and people on Freenode always seem to use a shared host running 3.23). So we’ll rewrite it to use JOIN instead.
SELECT DISTINCT studentName
FROM
quizAnswers AS a
JOIN quizAnswers AS b USING (studentName)
WHERE
a.question = 'A'
AND b.question = 'B';
+-------------+
| studentName |
+-------------+
| seekwill |
+-------------+
“But wait!” they say. “This example only had 2 values. My real data has 22 possible values. I don’t want to self-join the table 22 times (or write 22 subqueries). And the real values are random strings, not nice and sequential, so don’t try building the query in a loop or anything!”
Never fear! Obscure and infrequently used string and aggregate functions to the rescue!
SELECT studentName FROM quizAnswers GROUP BY studentName HAVING BIT_OR(1 << FIELD(question, 'A', 'B') - 1) = 3; +-------------+ | studentName | +-------------+ | seekwill | +-------------+
Handling the rest of those 22 values is a simple matter of adding them to the parameters to FIELD(), and changing that 3 to (2^number_of_values - 1). We'll even put the comma-separated values into a user variable and let MySQL figure out the rest (note the switch to FIND_IN_SET here, since now we're using a single string of values):
SET @q = 'A,B';
SELECT studentName
FROM quizAnswers
GROUP BY studentName
HAVING
BIT_OR(1 << FIND_IN_SET(question, @q) - 1)
=
(1 << LENGTH(@q) - LENGTH(REPLACE(@q, ',', '')) + 1) - 1; -- This is 2^numValues - 1
+-------------+
| studentName |
+-------------+
| seekwill |
+-------------+
How does it work? I'll let you look up the functions on your own, but here are the values each generates:
SELECT
studentName,
question,
FIND_IN_SET(question, 'A,B,C') AS position,
1 << FIND_IN_SET(question, 'A,B,C') - 1 AS bitmap
FROM quizAnswers;
+-------------+----------+----------+--------+
| studentName | question | position | bitmap |
+-------------+----------+----------+--------+
| seekwill | A | 1 | 1 |
| seekwill | B | 2 | 2 |
| seekwill | C | 3 | 4 |
| roxlu | A | 1 | 1 |
| fury | B | 2 | 2 |
| fury | B | 2 | 2 |
+-------------+----------+----------+--------+
SELECT
studentName,
BIT_OR(1 << FIND_IN_SET(question, 'A,B,C') - 1) AS groupBitMap
FROM quizAnswers
GROUP BY studentName;
+-------------+-------------+
| studentName | groupBitMap |
+-------------+-------------+
| fury | 2 |
| roxlu | 1 |
| seekwill | 7 |
+-------------+-------------+

Hi Scott. Liked the article – would like to feature it in the Dec issue of the Pipelines Newsletter. Here is our November issue:
http://www.quest-pipelines.com/newsletter-v8/newsletter_1107.htm
Look forward to hearing back from you.
Comment by Robin — November 15, 2007 @ 9:49 am