Tokenizer Difficulties

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I've delved into the usage of the PHP Tokenizer that directly
interfaces with the Zend engine.

So far, I have found it incredibly useful when it comes to editing a
PHP file.

What I am trying to do is to make a PHP5 class compatible in PHP4 by
running it through a class I made.

To do this, I decided that first I had to have the ability to remove
the __construct(), and the visibility declarations!  So I have a
function that runs a switch statement, and it removes all visibility

Now here is my problem:
There are 2 visibility types, 1) for variables, 2) for functions.

If I replace every visibility declaration with "var", it will
obviously not work for functions, but functions are declared AFTER
visibility, and therefore I do not know how I can make sure that only
the visibility RIGHT before that function will be removed without
affecting ANYTHING else.

Does anyone know how I can assign a "var" replace for variable
visibility and just remove visibility for functions?  Many thanks in
advance, I will try to delve further into this if you guys need me to
for more support.

Here is my function:

function TokenizedRetrograde($file_name, $visibility_pointer =
    $source = file_get_contents($file_name);
    $tokens = token_get_all($source);
    $function_declared = false;
    $x = 0;
    foreach ($tokens as $token)
        if (is_string($token))
                // simple 1-character token
                 $data .= $token;
                 // token array--$text stores the data from a specific token.
                 list($id, $text) = $token;
                 switch ($id) {
                    case T_PROTECTED:
                    case T_PUBLIC:
                    case T_PRIVATE:
                        //Replace private, public, protected keyword with visibility
                        $text = 'var';
                        $visibility_set = true;
                    case T_CLASS:
                        //T_CLASS occurs when a class is declared.
                        $class_declared = true;
                    case T_VARIABLE:
                    case T_OBJECT_OPERATOR:
                        $in_object_reference = true;
                     case T_STRING:
                        //If a class was just declared, the string is the class name.
                        If ($class_declared === true)
                            $class_declared = false;
                            $class_name = $text;    
                        //If __construct is referenced to within the files code, replace
                        //it with the name of the class previously gotten from class
                        If ('__construct' == $text)
                            $text = $class_name;
                    case T_FUNCTION:
                        $function_declared = true;
                    case T_WHITESPACE:
        $data .= $text; //Add text previously set and possibly modified.

Re: Tokenizer Difficulties

Quoted text here. Click to load it

Most of your problem comes from trying "edit" a stream of tokens,
with no memory of the context in which an individual token is found.
Without the context, you simply can't do the job right.
Sure, you can build ad hoc machinery to try to remember it,
but such ad hocness generally turns into a baroque pile of code.
The general way to collect such context for structured texts is called
"parsing" (of which tokenizing is just the first step).

If you can parse the PHP, and build conventional compiler data structures
for this, then you could consider walking over the trees and using
the "parent context" (your visibility declarations are either in variable
declaration or function declaration context) to make this change
safely and reliably.

Even cooler,  you could perhaps even write patterns that expressed
the change you wanted to make, *and* the context, so that you
could easily express the patterns of changes you wanted to make, e.g.,

   "public \x;" -> "var \x;".
  "private function \fnheader \fnbody" -> "function \fnheader \fnbody".

A tool that can do this exists, can already parse PHP4 and PHP5. .
It would be ideal for implementing the specific task you described,
and the broader task you imply of converting PHP5 code back
into executable PHP4.

I'm not sure exactly why you want to do that, considering already have
PHP5 :-}

Ira D. Baxter, Ph.D., CTO   512-250-1018
Semantic Designs, Inc.

Site Timeline