Word frequency analyser

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View


Does anyone happen to know if there's a convenient module which will analyse
at least two XML files and list the most frequently-used words?

(It would have to be able to reject tags and certain words such as "the" and

Re: Word frequency analyser

Quoted text here. Click to load it

XML::Parser, which a Char handler.

Quoted text here. Click to load it

split in the handler on non-words, use a hash for counting. Delete
afterwards all occurences of the, is, etc.

Note that this is a very simplistic approach, since it words are hypenated,
it counts them as two different ones.

John                   Small Perl scripts: http://johnbokma.com/perl/
               Perl programmer available:     http://castleamber.com/
                                        I ploink googlegroups.com :-)

Re: Word frequency analyser

Quoted text here. Click to load it

Seaching for "word frequency" on search.cpan.org turns up some modules
that are designed for this sort of thing, and may take some of the
trickier issues into account.


Re: Word frequency analyser

On Tue, 25 Oct 2005 01:30:58 +1000, Scott W Gifford wrote:

Hi Folks

A list of stop word, courtesy of MySQL, can be downloaded from:


Site Timeline