| author: | Cimarron Taylor |
|---|---|
| version: | 0.1 |
| date: | January 16, 2009 |
From time to time I need to process the .sql files generated by mysqldump so I wrote a tool to help make this easy.
mydp is a program for processing mysqldump files in the manner of ‘awk’.
It reads mysqldump sql statements from standard input and executes Lua callback functions as table definitions and insert statements are encountered. The Lua callback functions are specified in a file passed to mydp via the command line, e.g:
% mydp callbacks.lua < dump.sql > output
The callbacks are typically used to filter and process only a small subset of the mysql dump data and/or save it for further processing.
mydp is written in Lex and Lua. It is released under the 2-clause FreeBSD licence.
The source is kept in a Mercurial repository over at bitbucket which you can get via:
% hg clone http://bitbucket.org/cdtmydp/mydp
You can also browse the source at bitbucket here: http://bitbucket.org/cdtmydp/mydp/src
Checking out mydp from bitbucket requires Mercurial
Building mydp requires make, gcc, flex and lua version 5.1
Re-building the mydp documentation from its reStructuredText sources requires Python and Sphinx
Mydp expects the lua file passed on the command line to define the following Lua functions:
This function is called whenever a new CREATE TABLE statement is encountered in the dump file. The name of the table and the byte position in the stream are passed as arguments.
If create returns true, mydp will scan and process the data for the table. Otherwise it will skip forward until it exhausts its input or encounters the next CREATE TABLE statement.
This function is called whenever a column definition within a CREATE TABLE is encountered. It is passed the name of the column as an argument.
if column returns true, mydp will scan and process the data for the column via the values() callback. Otherwise the column will be skipped.
This function is called when mydp encounters an INSERT INTO statement. It is useful when the processing involves initializing state between the calling of the column() callbacks and the values() callbacks.
This function is called whenever mydp encounters a row of data. It is passed the values of only those columns for which the corresponding call to the column() callback returned true.
This function is called after all the data is processed.
Here are a few selected examples to show how the callbacks work.
This is a trivial example which simply processes all tables and columns, printing the values on separate lines:
function finish() -- cleanup any resources
end
function create(tablename,position) -- handle the start of a new 'CREATE TABLE'
print( '-- processing ' .. tablename)
return true
end
function column(columnname) -- handle the next column definition in the CREATE table
return true -- tell the lexer we want to process the column
end
function insert(tablename) -- handle the start of a new 'INSERT INTO'
end
function values(...) -- handle the next set of values
for i=1, select('#', ...) do -- for each value passed
local v = select(i,...) -- get the value
print(v) -- print the value
end
print() -- print a blank line in between the records
end
In this example we split the mysql data into separate column files. This creates one directory for each table in the dump file with one column file holding the data for the corresponding column in the table:
table_insert = table.insert
TARGETDIR = 'work/example3'
cur_table = nil -- the current table we're processing
cur_columns = nil -- list of columns in the current table
cur_files = nil -- file handles
function finish() -- cleanup any resources
if cur_files then -- \ if we have any files open from the previous
for i,f in ipairs(cur_files) do -- \ create table statement then we close them
assert(f:close()) -- >
end -- /
cur_files = nil -- /
end
end
function create(tablename,position) -- handle the start of a new 'CREATE TABLE'
finish() -- close files from previous table
cur_columns = {} -- reset column list
cur_table = tablename -- save new table name
return true
end
function column(columnname) -- handle the next column definition in the CREATE table
table_insert(cur_columns, columnname) -- record the next column
return true -- tell the lexer we want to process the column
end
function insert(tablename) -- handle the start of a new 'INSERT INTO'
if cur_files then return end -- nothing to do if we've already initialized files for this table
cur_files = {} -- otherwise this is the first call
local tablename = cur_table -- \
local tabledir = TARGETDIR .. '/' .. tablename -- > create the directory for the table
os.execute('mkdir -p '..tabledir) -- /
for i,cname in ipairs(cur_columns) do
local filename = tabledir .. '/' .. cname -- \ for each column, open a new column file and
local f = assert( io.open(filename, 'a+') ) -- > save the file handle so we can access it later
cur_files[i] = f -- /
end
end
function values(...) -- handle the next set of values
for i=1, select('#', ...) do -- for each value passed
local v = select(i,...) -- get the value
local f = cur_files[ i ] -- get the corresponding file
if v then assert(f:write(v)) end -- write the value if not NULL (mapped to nil by lexer)
assert(f:write('\n')) -- write a newline to separate values
end
end
mydp - project directory
|-- Makefile - build commands
|-- doc - sphinx documentation
|-- src - C and Lua sources
| |-- example.lua - simple example which prints all tables/columns
| |-- example2.lua - example which filters some values
| |-- example3.lua - example which splits data into separate column files
| `-- mydp.l - Lex source to mydp
|-- test - sample test files and outputs
`-- todo - known improvements
Additional directories are created by the Makefile directives:
mydp
|-- bin - compiled executables
`-- work - temporary files generated by tests