How to get Fast MyISAM table loads.

Tokuview (http://blogs.tokutek.com/tokuview/iibench/) is building a test scripts/program for 1B record insert performance. 

This is fine, but ultimately the logistics of the DB environment will dictate what ‘tricks’ can be pulled to insert records the fastest.  Can you access raw MyISAM files or not?  Are you running concurrent inserts?  Do you require ACID compliance?

There are 3 basic techniques I’ve found for inserting records into a MyISAM DB table are:

A) Record – By – Record, using Bounded Statments.

This is the slowest, but significant performance can be gained by creating ‘bulk’ insert SQLs like:
INSERT INTO xxxxx (col1, col2, col3,col4, etc…)
VALUES (? , ?, ?, ?, ?, …), (? , ?, ?, ?, ?, …), (? , ?, ?, ?, ?, …), (? , ?, ?, ?, ?, …), (? , ?, ?, ?, ?, …), and many more.

The number of VALUES() given should be dictated by the SIZE of each record inserted and the size of the data packet you want your program to send over the wire. I’ve seen peaks of 6MB/sec of network activity (via MySql Admin) when loading a fairly wide table with this.

B) File-Level copy of MyISAM files.

If you are attempting to simply DUPLICATE a MyISAM table’s data, then file-level COPY of the .frm, .MYI, and .MYD files is beyond reproach.  Use ‘cp’ for local databases, or FTP (not rcp) for inter-server table copies.  rcp requires ’secured’ ports of which there are less than 1024 of those unused.  Millions of records can be copied is seconds.

Step 1) LOCK for WRITE both source and output tables.
Step 2) FLUSH both source and output tables.  (Your DB could hang here if it needs a bounce).
Step 3) Perform the file-level copy of the 3 files that make up an MyISAM table.
Step 4) Re-flush output table.
Step 5) Unlock all the tables.

C) Loading data from flat textual files.

Using the SQL statment “LOAD DATA INFILE ‘xxxxx’ INTO xxxx (col1,col2,col3)” is very fast, much faster than inserting records most other techniques. 

D) A note about inserting into different tables…

When you are running a heavily parallized environment, with many different processes pushing millings of records in/out of the DB, then you need to be aware of the “Default Key Cache” bottleneck (explained @ dev.mysql.com). I expected more details to eventually appear on the AnalyticsArts blog with some backup evidence.

To deal with this, create yourself a different key-cache for each table.  This will avoid single-threading key-cache updates when inserting records into a MyISAM table. 

 

Overall insert performance can be further improved by using an ‘altered’ or ‘creative’ the table structure changes:

Creative) Use non-unique keys upon the table (Even if the data is unique).  By turning off/on the indexes (ALTER TABLE x DISABLE KEYS), the actual load of data and rebuild of the indexes is fastest.
Note: The Primary and Unique keys will NOT be disabled with this.

step 1) ALTER TABLE x DISABLE KEYS
step 2) load data infile ‘xxxx\xxxx’
step 3) ALTER TABLE x ENABLE KEYS 

Altered) A more complex technique that involves file-level copy of the .MYI, and .frm files.
This technique is effectively the same as disabling the keys, but for tables that require unique/primary keys. 

step 1) Create a ‘cloned’ table without any indexes, and load that table with the necessary data (using your favorite technique: rec-by-rec or “load data infile” from above).
step 2) Lock both tables for write.
step 3) flush both tables (critical)
step 4) file-copy the .frm and .MYI from the original table ontop of the existing .frm and .MYI files of the cloned table.
step 5) flush output table (critical)
step 6) perform a REPAIR QUICK sql against the output table.
step 7) rename (or drop) original table.
step 8) rename ‘cloned’ table into.

The REPAIR QUICK rebuilds the indexes for the table without rebuilding the actual data file.
The data file is rebuilt when you issue an ALTER TABLE ADD [UNIQUE | PRIMARY] INDEX (col1,col2) and added to the total load time of the table.

In conclusion, using “Load data infile” against a table with disabled keys and an assigned key-cache, is the fastest method I know of for loading huge volumes of data into any MyISAM table.

– JJ –

Leave a Reply

You must be logged in to post a comment.