SQL parsing

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Iíve written a simple results cache for an Oracle driven application.
When I run a query I need to know which tables it uses so I can find out
if they have been modified and thus the cached data must be discarded
(as I said, itís a very simple cache and the DB is basically static).

I didnít like the idea of passing a table list as argument for each
query because they are dynamically built and, anyway, I thought itíd
become unmaintainable sooner or later. So I wrote an even simpler SQL
parser that looks like this:

$regex = '/\W(FROM|JOIN)\s+(' . $tables . ')(?:\W|$)/i';
preg_match_all($regex, $sql, $matches);

So far, it works fine enough for my purposes, but I think itís not a
long term solution. Is it possible to build a generic reliable SQL
parser to fetch the table names of a query without diving in the regex
hell? Do you know about an existing library?* Can you think of any other

Thank you in advance,

[*] Iíve browsed some code out there that does what I need (typically
via regex soup), but it always fails with the queries I test.

-- http://alvaro.es - Ńlvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programaciůn web: http://bits.demogracia.com
-- Mi web de humor al baŮo MarŪa: http://www.demogracia.com

Re: SQL parsing

On May 20, 12:43 pm, "=C1lvaro G. Vicario"
Quoted text here. Click to load it

Certainly its **possible** to write an SQL parser - lots of other
people have managed - but stepping back a bit this seems like
premature optimization - even for an Oracle DBMS. Maybe you should
start by looking at the DBMS. Surprisingly it looks like Oracle didn't
introduce result caching until 11g but prior to that it would still be
possible to minimise the disk I/O with a large buffer cache.

For large result sets you may get better performance by going back to
the DBMS - this the sort of thing DBMS are designed to do well - more
so than PHP - which (IME) can be a bit slow handling large arrays.

For small result sets (except maybe aggregates) SQL tuning is the way
to go. While the CBO can be very, very smart - it can also be very,
very dumb at times.

As you have discovered, your regex fails to deal with a lot of cases,
but even a complete parser won't cope with stored procedures. Much
though I dislike SQL stored procedures this might be the most
convenient way to solve the problem - add a parameter for
last_cache_timestamp and pass in the value from the previous query -
in the PL/SQL, first check if any of the tables have been modified
since then - if not return a custom error code, otherwise redo the

But the simplest option might be to just apply a default timeout on
the local query cache.

Alternatively build your own abstraction in PHP whereby developers
never enter SQL commands directly, but it makes it easier for you to
extract the table names.


Re: SQL parsing

Ńlvaro G. Vicario wrote:
Quoted text here. Click to load it

Of course it is possible to build a PARSER. This is generally done
WITHOUT regexes. There are some examples available on the net, none that
I tested though. Google for "PHP SQL parser". If you want to right your
own, Matt Zandstra's book ("PHP5 objects and patterns" or something like
that) has a fine example & starting point for you.
Rik Wasmus
...spamrun finished

Site Timeline