Simple question on string extraction

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I'm having to modify a PHP script even though I have little knowledge of PHP
itself. The script extracts specific strings from an html file, and I need
to it extract some further information.

Specifically, each file represents an article written by an author. The
author's name is typically preceded by a 'By' or a 'by', then it goes on
till there's a carriage return.

So for example, the file might contain something like this:

The Need For Regeneration

by <b>John Smith</b>

We have seen the waste that has been produced....

(rest of article)


How To Make Lots and Lots of Money Writing PHP

by The Supreme Coder

The first thing you need to know about making money is...

(rest of article)

So I need code that will start searching the file from the beginning for the
words 'by ' or 'By ', then grab everything that follows that until it gets
to a new line and assign that to a variable. In the examples I have given
above, it would grab '<b>John Smith</b>' and 'The Supreme Coder'. I've seen
a function called preg_match which might do the job, but it uses regular
expressions which I have little knowledge of.

Would any person be so kind as to post what arguments I would need to call
this function with?



aknak at aksoto dot idps dot co dot uk

Re: Simple question on string extraction

I've been doing something similar myself, but wanted to avoid the chance of  
getting an accidental early string match.

The strpos() function will let you locate a string within another string  
(I'm assuming here that you've got the whole html page as a single string),  
and, if required, you can specify a starting position.

So something like

$p1 = strpos($rec,"</header>");

would let you get beyond the html header, then

$p2 = strpos($rec," by ",$p1);

would let you find the first occurrence of " by " beyond position $p1 (or  
maybe "by<", depending whether there's a space there or not)

then you can search for <b> and </b> in the same way, adjust your sums a  
bit, and get

$author = substr($rec,$start,$length);

where $start will probably be something like $p1+3 and $length something  
like $p2-$p1-2, or whatever it turns out to be, and whichever way round $p1  
and $p2 end up.

Hope this helps. As an alternative you might try the explode function using  
" by " as the string to split $rec on, and then check each array element.

Quoted text here. Click to load it

Site Timeline