Simple Bayesian classifier?

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Kimmo Laine wrote:
Quoted text here. Click to load it

Unless next_page.php generates PHP, the script with this include will
only get HTML.

Quoted text here. Click to load it


    if (isset($_GET['foo'])) {
      echo '<?php echo $_GET[\'foo\']; ?>';
    } else {
      echo '<?php echo \'Not available\'; ?>';

File not found: (R)esume, (R)etry, (R)erun, (R)eturn, (R)eboot

Re: Simple Bayesian classifier?

Quoted text here. Click to load it

spamassasin's code is OS, have you checked that out?
AFAIK php offloads its maths to c libraries; so your problem is that
it can be much more computationally intensive to work by the book,
with no code optimisation techniques etc... (hash tables and so on).
(A mathematician C programmer I know got their code to run in 2 days
rather than 2 weeks after some optimisation)

Re: Simple Bayesian classifier?

At Fri, 08 Jun 2007 20:52:39 +1000, Pavel Kalinov let h(is|er) monkeys

Quoted text here. Click to load it

You may like
I am a bit surprised you have such a slow response, the typical algorithms
don't seem to be extremely taxing.

As part of an author authenticity scoring app Naive Bayesian filtering
proved quite useful, for spam filtering its use *by itself) proves rather
limited. Quite a few spam creators (scripts) are well equipped these days
to lower scores substantially, allowing their messages to leak through.


Schraalhans Keukenmeester -
[Remove the lowercase part of Spamtrap to send me a message]

  "strcmp('apples','oranges') < 0"

Re: Simple Bayesian classifier?

Thanks, I didn't know this - will look into it.
BTW, I am not trying to make a spam filter, but to sort news articles in  
a number of categories (16 at present, as test). And I need  
milliseconds, not days :-(


shimmyshack wrote:
Quoted text here. Click to load it

Re: Simple Bayesian classifier?

Pavel Kalinov wrote:

Quoted text here. Click to load it

Still, SpamAssassin might be what you're looking for.

Turn off all SA's non-Bayes scoring, and then feed SA a corpus of say, 500
sports articles, telling it that they're "spam"; then 500 non-sports
articles, telling them they're "ham". After this preparation, your SA
configuration should be primed to detect sports articles.

Another 15 SA configurations, and your setup should be complete.  

With SA, one user can have multiple configurations using the "--configpath"
command-line option.

Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 108 days, 16 min.]

                              URLs in demiblog

Site Timeline