Quick and easy lexer

I like writing small parsers. As soon as I see a solution that could use parsing, I’ll write a small parser. My least favorite thing when writing a parser is the lexer. The lexer splits the input into small tokens, it’s fairly straightforward to do, but I have to do it everytime.

I’ve just discovered the module shlex in the Python standard library. shlex is a ready-to-use lexer for a shell-like syntax. It’s easy to use, consider the following code:

# A first comment
x = 10

# Another comment
function f(x, y) {
    blah = 'foo bar 123'
    var foo = 0x12345
}

To “lex” it just do something like this:

from shlex import shlex

print list(shlex(open('sample.txt')))

The result will be:

['x', '=', '10',
 'function', 'f', '(', 'x', ',', 'y', ')',
 '{',
 'blah', '=', "'foo bar 123'",
 'var', 'foo', '=', '0x12345',
 '}']

Now that the lexing is done, you just have to deal with the fun part: the grammar and the parsing. To learn how to use shlex, read the chapter about shlex in the excellent Python Module of the Week.