A Little Noise

February 11, 2010

Restore from mysqldump --all-databases in parallel

Filed under: MySQL — snoyes @ 3:12 pm

csplit backup.sql /"^-- Current Database"/ {*}; for f in xx*; do cat xx00 "$f" | mysql& done;

and then rm xx* when the background jobs have finished.

I haven’t done any benchmarking, and I haven’t worked out what to do if the number of databases exceeds the number of available connections.

January 29, 2010

mysql’s --xml and some XSLT (now with JSON!)

Filed under: MySQL FAQ — snoyes @ 12:12 pm

Somebody asked in Freenode the other day how to get their data out of MySQL into a specific XML format.

Both mysqldump and the mysql client have --xml options. The output from the cli looks like this:

<resultset statement="SELECT here">
  <row>
    <field name="field1">value a</field>
    <field name="field2">value b</field>
  </row>
  <row>
    <field name="field1">value c</field>
    <field name="field2">value d</field>
  </row>
</resultset>

I wanted to change those <field name=”fieldName”> into <fieldName> tags, and all it takes is a little XSLT, which I’ve uploaded here.

An XSLT processor lets you do something like this:

mysql --xml -e "SELECT here" | xsltproc mysql.xsl -

The result will be something like:

<resultset statement="SELECT here">
  <row>
    <field1>value a</field1>
    <field2>value b</field2>
  </row>
  <row>
    <field1>value c</field1>
    <field2>value d</field2>
  </row>
</resultset>

If JSON is your thing, you can use xml2json to take that output a step farther:

mysql --xml -e "SELECT here" | xsltproc mysql.xsl - | xsltproc xml2json.xsl -

Result (whitespace added):

{"resultset" : 
  {
    "statement" : "SELECT here", 
    "row" : [
      {"field1" : "value a", "field2" : "value b"},
      {"field1" : "value c", "field2" : "value d"}
    ]
  }
}

That will suffice until the mysql client has a –json parameter.

January 28, 2010

Make an ALTER TABLE script with MySQL Workbench

Filed under: MySQL — snoyes @ 11:05 am

So you have two versions of a database sitting around – maybe you’ve been trying things out on your development box, and now you want to move it all over to production. If it’s just a new table here and an extra column there, you could do it by hand, but if you’ve made lots of changes, it would be nice to let the computer do it for you.

Enter MySQL Workbenchmysqldiff.

January 26, 2010

Trigger on firing table

Filed under: MySQL — snoyes @ 4:07 pm
CREATE TABLE t1 (id int);
CREATE TRIGGER t1_ai AFTER INSERT ON t1 FOR EACH ROW 
  INSERT INTO t1 VALUES (NEW.id);

That’s not allowed; it won’t let you create the trigger.

CREATE TABLE t1 (id int);
CREATE TABLE f1 (id int) ENGINE=FEDERATED CONNECTION='localhost/test/t1';
CREATE TRIGGER t1_ai AFTER INSERT ON t1 FOR EACH ROW 
  INSERT INTO f1 VALUES (NEW.id);

The server lets that through. Don’t do it.

December 4, 2009

On unions and derivatives

Filed under: MySQL — snoyes @ 4:25 pm

On occasion I need to generate a number of rows. I don’t care about the contents, just how many rows there are. Maybe I want to create a table filled with a sequence of numbers. One common way to do this is to join a bunch of subqueries built out of unions:


SET @count := 0;
INSERT INTO table SELECT @count := @count + 1
FROM (SELECT 1 UNION SELECT 2) a
JOIN (SELECT 1 UNION SELECT 2) b
JOIN (SELECT 1 UNION SELECT 2) c

Of course, I could union more than two selects (e.g. SELECT 1 UNION SELECT 2 UNION SELECT 3). That makes each derived table longer, but reduces how many must be joined.

Question: What’s the optimal size of each derived table?

Defining “optimal”: since this is a one-off query, I don’t much care about performance (within reason), but I do care about how much typing I have to do. So I will favor “fewer characters” over “fewer joins”.

Each derived table in the query takes up 15 * numSelects + 2 characters.

To generate a given number of rows requires CEILING(LOG(numRows) / LOG(numSelects)) derived tables.

If I want 1000 rows in the result, I’d need 10 derived tables of two rows each (320 characters), or 7 of three rows (329 characters), or 3 of ten rows (456 characters).

For any given numRows, what’s the minimum of CEILING(LOG(numRows) / LOG(numSelects)) * (15 * numSelects + 2) ?

The answer varies with numRows – 2, 3, or 4 are alternately best when generating less than 130 rows, but any higher than that and the best choice usually turns out to be 3.

(SELECT 1 UNION SELECT 2 UNION SELECT 3)

November 24, 2009

group date

Filed under: MySQL — snoyes @ 8:49 am

A non-rigorous, non-scientific, totally off-the-cuff test of which function to pick when you need to group by year and month.

I populated a table with 262K rows of random dates, and then ran

SELECT %s, COUNT(*) FROM table GROUP BY %s ORDER BY NULL

with various functions, which should all result in the same grouping. I repeated each query five times and show the average time, using three different column types DATE, DATETIME, and TIMESTAMP.

expression DATE DATETIME TIMESTAMP
EXTRACT(YEAR_MONTH FROM d) 0.362 0.369 0.581
LAST_DAY(d) 0.374 0.389 0.582
DATE_SUB(d, INTERVAL DAY(d) DAY) 0.429 0.882 1.452
d – INTERVAL DAY(d) DAY 0.429 0.887 1.535
SUBSTRING(d, 1, 7) 0.454 0.488 0.719
YEAR(d), MONTH(d) 1.046 1.126 2.045
MONTH(d), YEAR(d) 1.116 1.196 2.112
LEFT(d, 7) 1.307 1.405 2.123
DATE_FORMAT(d, ‘%Y%m’) 1.480 1.565 2.112
DATE_FORMAT(d, ‘%Y-%m’) 1.514 1.615 2.564
DATE_FORMAT(d, ‘%m%Y’) 1.517 1.604 2.420
DATE_FORMAT(d, ‘%m-%Y’) 1.562 1.656 2.465
MONTHNAME(d), YEAR(d) 1.613 1.713 2.812
YEAR(d), MONTHNAME(d) 1.663 1.766 2.873

And just in case you want to extract a full date (which really only makes sense for datetime and timestamp):

expression DATE DATETIME TIMESTAMP
DATE(d) 0.357 0.374 0.591
EXTRACT(YEAR_MONTH FROM d), EXTRACT(DAY FROM d) 0.377 0.407 0.730
EXTRACT(DAY FROM d), EXTRACT(YEAR_MONTH FROM d) 0.395 0.422 0.751
MONTH(d), YEAR(d), DAY(d) 0.395 0.426 0.870
YEAR(d), DAY(d), MONTH(d) 0.398 0.440 0.862
YEAR(d), MONTH(d), DAY(d) 0.406 0.441 0.867
DAY(d), YEAR(d), MONTH(d) 0.409 0.444 0.859
LEFT(d, 10) 0.437 0.487 0.728
SUBSTRING_INDEX(d, ‘ ‘, 1) 0.439 0.475 0.743
DAY(d), MONTH(d), YEAR(d) 0.441 0.477 0.901
MONTH(d), DAY(d), YEAR(d) 0.442 0.472 0.914
SUBSTRING(d, 1, 10) 0.460 0.496 0.729
DATE_FORMAT(d, ‘%d%Y%m’) 0.539 0.571 0.852
DATE_FORMAT(d, ‘%m%Y%d’) 0.542 0.570 0.841
DATE_FORMAT(d, ‘%m%d%Y’) 0.543 0.572 0.846
DATE_FORMAT(d, ‘%Y%m%d’) 0.544 0.570 0.842
DATE_FORMAT(d, ‘%Y%d%m’) 0.544 0.572 0.843
DATE_FORMAT(d, ‘%d-%Y-%d’) 0.547 0.574 0.848
DATE_FORMAT(d, ‘%d%m%Y’) 0.549 0.573 0.842
DATE_FORMAT(d, ‘%m-%Y-%d’) 0.552 0.583 0.869
DATE_FORMAT(d, ‘%Y-%m-%d’) 0.558 0.583 0.854
DATE_FORMAT(d, ‘%Y-%d-%m’) 0.569 0.593 0.867
d – INTERVAL HOUR(d) HOUR – INTERVAL MINUTE(d) MINUTE – INTERVAL SECOND(d) SECOND 0.573 0.653 1.249
DATE_FORMAT(d, ‘%d-%m-%Y’) 0.598 0.637 0.920
DATE_FORMAT(d, ‘%m-%d-%Y’) 0.601 0.633 0.904

November 20, 2009

GROUP UNCONCAT

Filed under: MySQL — snoyes @ 3:18 pm

Dunno why you’d rather do this in SQL than in your application layer, but if you do, here’s one way to turn a delimited string of values back into multiple rows – just the opposite of GROUP_CONCAT:

SET @sourceString = 'a,b,c,d,e';
SET @sql = CONCAT('INSERT INTO t VALUES (\\'', REPLACE(@sourceString, ',', '\\'),(\\''), '\\')');
PREPARE myStmt FROM @sql;
EXECUTE myStmt;

Just to show what’s going on:

mysql> SELECT @sql;
+----------------------------------------------------+
| @sql                                               |
+----------------------------------------------------+
| INSERT INTO t VALUES ('a'),('b'),('c'),('d'),('e') |
+----------------------------------------------------+

November 19, 2009

Fine-tuning FULLTEXT: Adding word characters

Filed under: MySQL FAQ — snoyes @ 8:27 am

Perhaps you want “.” or “:” or “-” or something to be treated as a word character so that you can do a fulltext match against an IP or MAC address.

If you want to change the set of characters that are considered word characters, you can do so in two ways…
* Modify the MySQL source … and recompile MySQL.
* Modify a character set file: This requires no recompilation.

Thus sayeth the Manual. What it doesn’t say is just what to modify in that character set file. (Edit: Now it does.) I had to ask MySQL developer Alexander Barkov for help to get it right.

In the user comments on that manual page, John Navratil has provided most of the answer, but there’s an important step missing: adding a new collation.
(more…)

November 11, 2009

Choosing index prefix length

Filed under: MySQL — snoyes @ 9:12 am

It can be handy to index just part of a long character column. The strings might vary enough in just the first few characters to get the same cardinality out of a partial index as a full index would give, and the partial index takes up much less disk space and memory.

Indexes in The Manual

But how long should that prefix be? Should it be the same cardinality as the original data? Within 10%?

I dunno; somebody else can blog about that decision. I’m just interested in a query that finds the number which satisfies your requirements.

I have a table called ‘numbers’ with an int field ‘num’ that just holds values 1 through a large number; for this case, it needs to be no larger than the length of the longest string in the table.

Then:

SELECT
  numbers.num AS prefixLength,
  dt.originalCardinality,
  (SELECT COUNT(DISTINCT LEFT(stringField, numbers.num)) FROM yourTable) AS prefixCardinality
FROM
  numbers
  JOIN (SELECT COUNT(DISTINCT stringField) AS originalCardinality FROM yourTable) AS dt
WHERE
  (SELECT COUNT(DISTINCT LEFT(stringField, numbers.num)) FROM yourTable) >= .90 * originalCardinality
ORDER BY numbers.num
LIMIT 1;

July 24, 2009

Careful with those required files in Ruby

Filed under: MySQL Gotchas — snoyes @ 10:53 am

Getting Ruby to talk to MySQL was pretty easy in irb:

C:\>irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'mysql'
=> true
irb(main):003:0> db = Mysql.connect('localhost', 'root', 'password', 'test')
=> #<Mysql:0x2cc5270>
irb(main):004:0>

But then I tried the very same code from a file:

C:\>ruby mysql.rb
./mysql.rb:3: uninitialized constant Mysql (NameError)
        from C:/Ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
        from C:/Ruby/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
        from mysql.rb:2

That require 'mysql' line looks in the current directory first … and I’d used the name mysql.rb for the file I was coding … so my code just included itself, and not the gem I wanted. Doh!

« Newer PostsOlder Posts »

Powered by WordPress