What chars to be escaped/not and when?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Hi all,

I am pretty new to PHP and am stuck on - what I think - is a generic
string handling problem.

I need to read and manipulate some HTML files and have a problem in
getting some substrings found even when - it is clear - strings are
there. (see a HTML chunk I need to edit at the end of this email).

In particular, the following functions are "randomly" working for me:

1) str_replace:
$chunk = str_replace('<!-- /templates/patternFinder/freePattern.txt --
Quoted text here. Click to load it
the string)

$chunk = str_replace('"20"><!-- /templates/patternFinder/
freePattern.txt --><form method="post" ','',$chunk); ==> it doesnt
work (does not find and remove the substring)

2) preg_match_all:
$pattern = '~<h6>(.*?)</h6><img src=(.*?) ~si';
if (preg_match_all($pattern, $chunk, $matches)>0) { ...
==> it works

$pattern = '~<h6>(.*?)</h6><img src="(.*?)" ~si';
if (preg_match_all($pattern, $chunk, $matches)>0) { ...
==> it doesnt work (it does not return any match)

I believed this has to do with the chars to be escaped but I have
still not found/understood what chars need to be escaped or not.

I have also tried the addcslashes function, changing from single to
double quotes string  delimiters without success.

I use PHP 5.2.3(?) with IIS locally.

I really appreciate any help or reply and can provide more
information, if needed.

Thanks a lot.


HTML file chunk
<table width="100%"><tbody><tr><td colspan="3" valign="top"
align="left"><h6>Roll-Down Wristers</h6><img src="http://
www.lionbrand.com/stores/lionbrand/thumbs/81000ada.jpg" alt="Image of
Roll-Down Wristers" width="150" border="0"><br></td><td></td><td
valign="top" align="right" height="20"><table width="400" border="0"
cellspacing="0"><tbody><tr><td valign="top" align="right"
height="20"><!-- /templates/patternFinder/freePattern.txt --><form
method="post" name="kitform1922242962" action="http://
www.lionbrand.com/cgi-bin/patternBuyer.cgi"><input name="qty"
value="1" type="hidden"><input name="itemKey" value="1922242962"
type="hidden"><input name="store" value="/stores/eyarn"
type="hidden"><input name="kit" value="1" type="hidden"><input
name="transNum" id="tn1922242962" value="" type="hidden"><input
name="sourceItem" value="" type="hidden"><input name="su"
id="su1922242962" value="" type="hidden"><table style="border-color:
rgb(217, 203, 194); border-collapse: collapse;"
border="1"><tbody><tr><td class="B1" id="b11922242962"
onmouseover="bgOn('b11922242962','T3b');" onmouseout="bgOff
('b11922242962');" width="100"><a class="B1a" href="http://
www.lionbrand.com/patterns/81000AD.html?noImages=">Free Pattern</a></
td><td class="B1" id="b21922242962" onmouseover="bgOn
('b21922242962','T3b');" onmouseout="bgOff('b21922242962');"
width="100"><a class="B1a" href="javascript:
document.kitform1922242962.submit();">Buy Materials</a></td></tr></

Re: What chars to be escaped/not and when?

stefcollect@googlemail.com wrote:
Quoted text here. Click to load it

when using preg you need to escape  < > [ ] / " ' . *
there may be more but thats all i can think of atm
so your pattern

'~<h6>(.*?)</h6><img src=(.*?) ~si';
should be

'~\<h6\>(.*?)\<\/h6\>\<img src=(.*?) ~si';

however i'm not sure using (.*?) would match correctly

http://us2.php.net/manual/en/reference.pcre.pattern.syntax.php may help you

Re: What chars to be escaped/not and when?

trookat wrote:
Quoted text here. Click to load it

Escaping looks fine.

Quoted text here. Click to load it

Not necessary, here.

Quoted text here. Click to load it

The angled brackets don't need to be escaped unless you're using them
as delimiters in the expression.  To escape PCRE regex chars, use
preg_quote().  The correct list of metacharacters is contained in the
resource you link at the end of your post.

Quoted text here. Click to load it

Depends, for the OP's first expression:

   $pattern = '~<h6>(.*?)</h6><img src=(.*?) ~si';

the second backreference will contain everything until the next space,
which may or may not be right, depending on the path in the src attribute.

In the second expression:

   $pattern = '~<h6>(.*?)</h6><img src="(.*?)" ~si';

the second backreference will correctly contain everything up until
the next double quote.  Although, it would be more efficient to use:

   $pattern = '~<h6>(.*?)</h6><img src="([^"]+)" ~si';

Quoted text here. Click to load it

@OP:  again, at first glance, your escaping looks fine, so I'd try to
check your data.  Personally, I didn't want to read through your
unreadable chunk of whitespace-devoid markup, so, you might want to
double-check your data.

$email = str_replace('sig.invalid', 'gmail.com', $from);

Site Timeline