Category: Technical

Anything that uses 1s and 0s

Audit Trail
More an exercise in stored procedures, triggers, and building queries culled from the information_schema than anything else, here’s an audit trail setup.

Audit trails let you see how data has changed over the course of time, so you know who to blame when something goes wrong.

This particular setup works by adding an AFTER UPDATE trigger to each table to be audited. That trigger looks at the OLD and NEW values of each field, and if there is a difference, inserts a record into the auditLog.
- auditTrail.sql – Creates the table to hold the audit data and the stored procedure which is called to add a trigger to each table you wish to audit.
- setupTest.sql – Creates an example data table such as you might want to audit, and calls the stored procedure to build the trigger for it. The output from that stored procedure should be piped back in to MySQL to actually create the trigger. Why doesn’t it do this automatically? Because prepared statements don’t support CREATE TRIGGER statements as of 5.0.27.
- runTest.sql – Inserts a few rows of data into the example data table, updates them, and displays the changes recorded in the audit table.
Here’s the Windows command line to run the test in the `test` database:
mysql test < auditTrail.sql && mysql -N test < setupTest.sql | mysql --delimiter=$ test && mysql -t test < runTest.sql

For my purposes, watching updates was sufficient. You could extend this to cover inserts and deletes too.

There are a number of drawbacks to this particular method:
- It's a bit unwieldy, since the SQL output has to be piped back in to create the triggers. But hey, if the Wizard can do it, it's ok for us mere mortals too. This could be wrapped in a script. Or maybe it's a good thing, because then you can go edit the output and exclude fields you don't want audited.
- The tigger has to be created for each table separately. Again, handy if there are only a couple tables you want to audit, but a real pain if you wanted to audit all 297 tables in your database.
- If the table structure changes, you'll need to drop and recreate the trigger.
- The stored procedure assumes the current database. Again, something trivial to change.
- Hope you didn't want to use your AFTER UPDATE trigger for something else.
My next trick will be to add something so that we know which user made the change. Not which MySQL user, mind you, since these changes are issued from PHP scripts which all run as the same MySQL user. Somehow I need to pass in my application user id. Any suggestions?
November 9, 2006

Here Be Dragons

I have a table of maps:

CREATE TABLE map (
    mapName varchar(255),
    mapType ENUM('continent', 'region', 'country'),
    continent varchar(255) NULL,
    region varchar(255) NULL,
    country varchar(255) NULL
);
INSERT INTO map VALUES 
    ('Europe', 'continent', 'Europe', NULL, NULL),
    ('North America', 'continent', 'North America', NULL, NULL),
    ('Northern Hemisphere', 'region', NULL, 'Northern Hemisphere', NULL),
    ('Sweden', 'country', NULL, NULL, 'Sweden'),
    ('Mexico', 'country', NULL, NULL, 'Mexico');

And a table of locations:

CREATE TABLE locations (
    locationId int auto_increment primary key,
    continent varchar(255),
    region varchar(255),
    country varchar(255)
);

INSERT INTO locations VALUES 
    (NULL, 'Europe', 'Northern Hemisphere', 'Sweden'),
    (NULL, 'North America',  'Northern Hemisphere', 'Canada'),
    (NULL, 'North America',  'Pacific Northwest', 'USA');

Obviously, this example is denormalized – in the real world, I’d use a surrogate key and store continentId, regionId, and countryId. Even then, it’s not the ideal layout for this application. But that’s just part of the challenge.

Here’s the goal: find the most specific map for each location. By most specific, I mean use a country map if available. If not, use a region map. Use a continent map as a last resort.

So, the result set should be:

+------------+---------------------+
| locationId | mapName             |
+------------+---------------------+
|          1 | Sweden              |
|          2 | Northern Hemisphere |
|          3 | North America       |
+------------+---------------------+

Bonus points if it works pre-4.1

Note the definition of map.mapType. In particular, the order of the values.

Then take a gander at section 3.6 of the manual.

SELECT 
    locationId,
    m.mapName
FROM 
    locations l
    JOIN map m ON (
        l.continent = m.continent
        OR l.region = m.region
        OR l.country = m.country
    )
    LEFT JOIN map m2 ON (
        (l.continent = m2.continent
        OR l.region = m2.region
        OR l.country = m2.country)
        AND CAST(m.mapType AS unsigned) < CAST(m2.mapType AS unsigned)
    )
WHERE
    m2.mapType IS NULL
GROUP BY 
    locationId;

October 6, 2006

Empty Set Equality

SELECT 1 LIMIT 0;
Empty set (0.00 sec)

SELECT 2 LIMIT 0;
Empty set (0.00 sec)

SELECT NULL LIMIT 0;
Empty set (0.00 sec)

Exactly what I’d expect. So here’s the question:

SELECT '1 = 1' AS equality, (SELECT 1 LIMIT 0) = (SELECT 1 LIMIT 0) AS result
UNION SELECT '1 = 2', (SELECT 1 LIMIT 0) = (SELECT 2 LIMIT 0)
UNION SELECT '1 = NULL', (SELECT 1 LIMIT 0) = (SELECT NULL LIMIT 0);

What do you think? Are all empty sets created identical?
MySQL says no.

+----------+--------+
| equality | result |
+----------+--------+
| 1 = 1    |      1 |
| 1 = 2    |      0 |
| 1 = NULL |   NULL |
+----------+--------+

September 27, 2006

Single-query UNION with LIMIT and ORDER
I have a query that behaves differently in every version of MySQL I’ve tried. Could you do me a favor? Run the following four lines and post a comment with your results. And if you know of a particular bug fix that addresses this issue, let me know about that too.
```
CREATE TABLE `test` (`id` tinyint);
INSERT INTO test VALUES (1), (2), (3), (4), (5);
SHOW VARIABLES LIKE 'version%';
(SELECT * from test ORDER BY id DESC LIMIT 3) ORDER BY id ASC;
```
(more…)
September 12, 2006
Auto_increment Indexing
It’s so common to set the auto_increment column to primary key, it has become common belief that it’s required. Not true; an auto_increment column can use any index type. It doesn’t even have to be unique:
```
CREATE TABLE testTable (
  id INT AUTO_INCREMENT,
  INDEX(id)
) ENGINE=MyISAM; -- Works for InnoDB too.

INSERT INTO testTable VALUES (1), (1), (NULL);

SELECT * FROM testTable;

+----+
| id |
+----+
|  1 |
|  1 |
|  2 |
+----+
```
August 31, 2006

Order By Puzzle

From a question on Freenode.

SELECT * FROM theTable;

+----+----------+
| id | data     |
+----+----------+
|  1 | middle   |
|  2 | first    |
|  3 | showLast |
+----+----------+

Fair enough. Let’s order it by `data`.

SELECT * FROM theTable ORDER BY data;

+----+----------+
| id | data     |
+----+----------+
|  3 | showLast |
|  1 | middle   |
|  2 | first    |
+----+----------+

What??? Maybe the server didn’t quite understand.

SELECT * FROM theTable ORDER BY data ASC;

+----+----------+
| id | data     |
+----+----------+
|  3 | showLast |
|  1 | middle   |
|  2 | first    |
+----+----------+

How did that happen? I assure you I have not switched to some ‘reverse order’ collation.

SHOW CREATE TABLE theTable;

CREATE TABLE `thetable` (
  `id` int(11),
  `data` enum('showLast','middle','first')
)

Ah. `data` is an enum. ORDER BY on an enum orders by the enum index, not the string. You can order by CONCAT(data) or by CAST(data AS char). Are there better ways?

August 23, 2006

LAST_INSERT_ID and multi-row inserts
A bit of a quiz to see if you’re paying attention:

First, we’ll set up a couple tables with some auto increment fields:
```
CREATE TABLE t1 (id INT AUTO_INCREMENT, INDEX(id));
CREATE TABLE t2 (id INT AUTO_INCREMENT, refT1 INT, INDEX(id));
```
Next, we’ll insert a row to one table:
```
INSERT INTO t1 VALUES (NULL);
```
Then, we’ll see what happens when we insert two rows using LAST_INSERT_ID():
```
INSERT INTO t2 VALUES (LAST_INSERT_ID(), 1), (LAST_INSERT_ID(), 2);
```
What happens here? Did the insert go ok, or did we get an error? What happens if we now issue
```
SELECT * FROM t2;
```
No errors, because we defined the auto_increment field for t2 not as a primary or unique key, but just with index. Remember that LAST_INSERT_ID() only picks up the last value you inserted using NULL, and not explicitly defined values. So we get two copies of the value ‘1’ in our id field.
```
+----+-------+
| id | refT1 |
+----+-------+
|  1 |     1 | 
|  1 |     2 | 
+----+-------+
```
August 22, 2006
Top N of a Group

No doubt everybody has solved it before, but ten seconds on Google didn’t show any obvious results, so here we go: How to pick the top 5 records of each group.
(more…)

August 21, 2006
Staging Table

Can MySQL handle a staging table, one where a fixed number of rows are reused in a round-robin basis (e.g., temporarily store AJAX-driven auto-saved forms, without overwriting the real values so the user can still abandon changes), without an external script?

First thoughts turned to events, triggers, stored procedures, and such toys in the most recent versions of MySQL – none of which worked quite right. Turns out we can do it with a little one-time setup and a single statement that will work as far back as version 4.0.4.
(more…)

August 8, 2006

Sum top N of a group

Trying to sum the top 5 entries in each group.

SELECT
    theId,
    SUM(theData) FROM (
        (SELECT
            @count := 0, 
            @oldId := 0 AS theId, 
            0 AS theData) 
        UNION 
        (SELECT 
            @count := if(@oldId = id, @count+1, 0), 
            @oldId := id, 
            if(@count < 5, a, 0) 
        FROM 
            (SELECT * FROM theTable ORDER BY id, a DESC) AS theTable 
        ) 
    ) AS theView 
WHERE theId != 0 
GROUP BY theId;

Table population:

INSERT INTO `thetable` VALUES 
    (1,1), (1,2), (1,3), (1,4),
    (1,5), (1,6), (1,7), (2,1),
    (2,2), (2,3), (2,4), (2,5);

August 1, 2006