SpellCheck in perl

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


We need to check the spelling of a word which is actually a Domain
name. For example we have to check the word " onlinetradeing ".
When checked with the spell checkers we are getting the words which are
unrelated such as on, obliterating, incinerating, intruding etc. But
exactly what we want was " online trading ". So we would like to
have the word to be split into phrases and check the spelling too. The
normal spell checkers are just checking the words in the dictionary but
not splitting the word into phrases.



Re: SpellCheck in perl

Quoted text here. Click to load it

I think this is the wrong newsgroup for this kind of request.  You
should probably take this kind of request to comp.lang.perl.misc.

That aside, it sounds like you need to write a little code that
tries to match the first N characters of your domain names against
words in your dictionary, and for each match try the same against
the remainder of the domain name (after the matched word), and so
on until some combination of matches matches the entire phrase.

Perhaps something like this (warning, untested code):

my %DICT = (); # initialize this with dictionary words

sub matcher
    my ( $phrase, @components ) = @_;
    my $plen = length ( $phrase );
    for ( my $i = $plen; $i > 0; $i-- ) # try to match more, first
        my $frag = substr ( $phrase, 0, $i );
    next unless ( defined($DICT) );
    return ( "MATCH FOUND", @components, $frag ) if ( $i == $plen );
    push ( @components, $frag );
    return ( matcher ( substr ( $phrase, $i+1 ), @components ) );
    return ( "NO MATCH POSSIBLE", @components, $phrase );

my ( $result, @word_list ) = matcher ( "onlinetrading" );
# $result should now be "MATCH FOUND"
# @word_list should now be ( "online", "trading" )

A problem with this "greedy" approach is that a subphrase might
match too much, rendering the reamining fragment unmatchable, for
instance matcher("maileditorial") would fail to parse the entire
phrase if "mailed" were in the dictionary.  The alternative would
be to build up a list of intermediate results, for each substring
that matched some word in the dictionary, and call matcher() on
each component of that list iteratively.  This would explore all
possible matches.

Good luck!
-- TTK

Re: SpellCheck in perl

Thanks for the idea....i already tried this and as you said i got lot
of suggesting words which is a big problem to handle all those word
lists and find best suggesting words. But Google is one example of what
we wanted but unfortunately the code is unreachable for us to do this

Re: SpellCheck in perl

Quoted text here. Click to load it

I'm not sure what you mean.  Do you need a word list?  Word lists
are often called "lemmas".  I have a pretty good one left over from
an AI project which has 247266 words in it that you can use.  It is
available for download at:

This file is text, and has a number and a word on each line,
separated by a tab.  The number is the relative frequency of the
word in the domain of the original project.  If you can't use the
frequency, then it's pretty easy for you to strip it out.

Quoted text here. Click to load it

I have no idea what this means.  Can you try saying it in a
different way?

Good luck,
-- TTK

Re: SpellCheck in perl

I will explain my problem....
I am working on a spell checker which will input wrongly spelt keywords
(only keywords not multiple keywords or Text) and suggest some correct
words. For example if i entered "tradeing" my spell checker suggesting
that keyword should be "trading". But if I try to Spellcheck a compound
word with out delimiter like "onlinetradeing" which is wrongly
spelt...it's suggesting "unlaundered" which is irrelavant. Its not
recognizing onlinetradeing as "online trading". If you want another
example for this kind..."virtaulflowers" which should be "Virtual
If you have any idea plz let me know....

Thanks for replying...


Re: SpellCheck in perl

Srikanth schreef:
Quoted text here. Click to load it


This same question was asked by you in news:comp.lang.perl.misc and has
already grown a thread there. You were already told that you shouldn't
multi-post. Now you do it again. Bye.

Affijn, Ruud

"Gewoon is een tijger."

Re: SpellCheck in perl

Thanks Ruud....All are giving some help regarding this spell check But
U have given far better help for me...This is the way of helping

Site Timeline