Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


I am trying to use regex to search a text file for certain word(s), and
if the word is found, extract every sentence which contains it as an
output text file. Any help would be appreciated.

Re: regex

Quoted text here. Click to load it

I'd think the biggest problem here is to define what a sentence is.  It
just can't be a string of words that end in '.' since abbreviations do
as well.

Re: regex

Mark Seger wrote:
Quoted text here. Click to load it

Finding sentence delimiters / abbreviations is usually done with a list of
known abbreviations, but there are also approaches that test if a dot and
its preceeding word form a collocation, thus are dependent and can be
treated as an abbreviation...
But still, you can have abbreviations at the end of a sentence and thus
have an ambiguous dot...

If the OP does not need to do serious natural language processing, he/she
can nevertheless use a rather naive definition of "sentence".

Arne Ruhnau

PS: If thats not enough, you may start to wonder whether "The White House"
should not form a word...

Re: regex wrote:

Quoted text here. Click to load it

In what way does your solution fail?  We can't help you if we don't know
what you have done.


Brian Wakem

Re: regex wrote:
Quoted text here. Click to load it

The hardest part of that will be determining what constitutes a sentence.
Rather than trying to roll your own, take a look at Lingua::En::Sentence.
It isn't perfect, but it is probably as good or better than you could do

Site Timeline