Click here to get back home

Spelling suggestions for common words - ispell, etc.

 HomeNewsGroups | Search | About
 comp.lang.perl.misc    Post an article   get this group's latest topics as an RSS feed add this group's latest topics to your My MSN content add this group's latest topics to your My Yahoo content
Subject Author Date
Spelling suggestions for common words - ispell, etc. sftriman 04-03-2008
Get Chitika Premium
Posted by sftriman on April 3, 2008, 5:47 am
Please log in for more thread options
I am looking for a way to, without custom defining a dictionary, to
get a list of suggested words for a misspelled word. Or better, "the"
most likely intended word for a misspelled word.

My base case to consider is:

dmr wjite saddle

which refers to a brand (DMR) and color (white) of a bike part
(saddle).

Ideally, dmr would return no suggestion, and wjite would return the
string "white" though I could certainly understand why "write" is
equally good a suggestion. I would be willing to define an add-on
dictionary to ignore certain words, such as brands and abbreviations
which are known to me, such as DMR, so that is possible to handle.

ispell -a yields:

<Q>$ echo "dmr wjite saddle" | ispell -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.50.5)
& dmr 28 0: Dr, Mr, DMD, DMZ, Dar, Der, Dir, Dur, Dem, dorm, Dame,
dame, demur, dime, dimer, dome, MDT, mfr, Dom, Dre, Dru, MRI, dam,
dim, dry, MD, Md, rm
& wjite 29 4: jute, kite, White, white, quite, Waite, write, Joete,
jitter, kiter, jet, Kit, jot, jut, kit, quiet, whiter, whitey, quote,
Whit, whit, Cote, Jude, Kate, cote, cute, quit, Wit, wit
*

from which I could easily pass on the dmr suggestions, but, scoring
and evaluating the suggestions for wjite is harder. "white" and
"write" are 'ranked' (I guess) 3rd, 4th, and 7th.

Does anyone know of an alternative that would return basic words as
suggestions? ispell is certainly a good start and I might be able to
use it, but I was thinking maybe there is something more human-
intuitive out there.

Thanks!
David

Posted by David Filmer on April 3, 2008, 3:29 pm
Please log in for more thread options
sftriman wrote:
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

Ever notice that Google does a pretty good job of that? So consider
Net::Google::Spelling:
http://search.cpan.org/~bstilwell/Net-Google-1.0.1/lib/Net/Google/Spelling.pm

--
David Filmer (http://DavidFilmer.com)

Posted by Joost Diepenmaat on April 3, 2008, 3:34 pm
Please log in for more thread options

> I am looking for a way to, without custom defining a dictionary, to
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

You may find this article interesting:
http://norvig.com/spell-correct.html

You still need a list of "good" words, of course.

--
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/

Posted by Ben Bullock on April 4, 2008, 2:27 am
Please log in for more thread options
> I am looking for a way to, without custom defining a dictionary, to
> get a list of suggested words for a misspelled word. Or better, "the"
> most likely intended word for a misspelled word.

> from which I could easily pass on the dmr suggestions, but, scoring
> and evaluating the suggestions for wjite is harder. "white" and
> "write" are 'ranked' (I guess) 3rd, 4th, and 7th.

One thing which might help you rank the strings is the "Levenshtein
distance". This gives you the "difference" between two strings as a
number. I don't know if it is on CPAN but there is a module found
here:

http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html

The documentation is here:

http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html

Presumably the string with the smallest Levenshtein distance from the
input string would be the most likely candidate for the spelling
checker, although some very rare words might have small distances.

Posted by Ted Zlatanov on April 4, 2008, 11:58 am
Please log in for more thread options
wrote:

>> I am looking for a way to, without custom defining a dictionary, to
>> get a list of suggested words for a misspelled word. Or better, "the"
>> most likely intended word for a misspelled word.

>> from which I could easily pass on the dmr suggestions, but, scoring
>> and evaluating the suggestions for wjite is harder. "white" and
>> "write" are 'ranked' (I guess) 3rd, 4th, and 7th.

BB> One thing which might help you rank the strings is the "Levenshtein
BB> distance". This gives you the "difference" between two strings as a
BB> number. I don't know if it is on CPAN but there is a module found
BB> here:

BB> http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/index.html

BB> The documentation is here:

BB>
http://world.std.com/~swmcd/steven/perl/lib/String/Levenshtein/Levenshtein.html

BB> Presumably the string with the smallest Levenshtein distance from the
BB> input string would be the most likely candidate for the spelling
BB> checker, although some very rare words might have small distances.

It's useful to rank the distance in terms of how close keys are on the
keyboard. For example, h and j are more likely to be swapped than h and
r, for the white/write/wjite example above. On CPAN, I found:

String::Similarity
String::KeyboardDistance (see above)
String::Approx (very comprehensive, probably the right choice for the OP)
Text::DoubleMetaphone

Ted

Similar ThreadsPosted
Spell checking an html file with aspell or ispell September 2, 2007, 8:58 am
Any suggestions? October 19, 2004, 3:47 pm
Any GUI Toolkit Suggestions? September 6, 2008, 12:25 pm
Matching the most common June 16, 2005, 6:19 am
regex search - suggestions? July 23, 2004, 11:15 pm
File::Tail Suggestions? September 14, 2004, 4:28 pm
Comparing 2 XML files need some suggestions please April 15, 2005, 7:14 am
Use of hashes and speed - suggestions ? November 8, 2005, 5:45 am
Any suggestions for my programming style? December 18, 2005, 8:42 pm
looking to prototype a grammar need suggestions. June 20, 2007, 10:44 am

Our other projects:

Art Dolls, Fairies and Mermaids - Sunnyfaces.net

Roy's Linux, Programming and Search Engines messages

1-Script XML SitemapXML Sitemap