Click fraud

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Although there is no formal definition of click fraud, it is customary
to consider fraudulent any click not resulting from a user genuinely
interested in an ad found in a pay-per-click search engine network such
as Google or Yahoo. This definition encompasses competitor fraud
(depleting your competitor's budget), distribution partner fraud and
other types of fraud committed either with or without financial
incentives, as well as accidental fraud. Most but not all click fraud
cases are potentially subject to prosecution, e.g. under the unfair
business practice code.
(excerpt from )

New Patterns and Trends

There is increasing evidence that new patterns are emerging. While
Google has improved impression fraud detection - a practice
consisting of generating bogus impressions to reduce ad relevancy of
your competitors to drive them out of Google - the fraud has spread
to Yahoo and MSN. And more sophisticated bogus impression schemes are
taking place on Google. Political activists and disgruntled employees,
a new type of fraudsters not motivated by money, click on expensive
paid ads from companies that they hate. They know which keywords are

Traffic distribution partners willing to eliminate competing affiliates
on a search engine network are rumored to have used click fraud
warfare, or clickware. Other fraudsters, in an attempt to hide their
activity, are generating bogus impressions, bogus clicks and also bogus
conversions. To get undetected, they keep their CTR and conversion
rates to more discrete - yet still too high - levels.

On the other side, many companies are changing their employee internet
usage policy for increased security. This means that sometimes, a same
company or government agency uses spoofed IP addresses or one IP and
one same browser shared by 50,000 employees. This can cause fraud
detection systems to fail and generate many false positives, thus
inflating fraud numbers. As far as organic search is concerned, we are
worried by individuals who have been banned by Google using the same
technology that get them banned to eliminate their competitors. This
and other schemes have the potential to reduce search results
relevancy, already low in some categories such as mortgages. However
search engines will fight back with more advanced relevancy algorithms.
This is actually one of the priorities for MSN and many others.

On the positive side, We see that some search engines are taking the
click fraud issue seriously. Over the long term, we believe that the
concept of click fraud will be replaced by the much more meaningful
concept of click quality or click profiling, a concept that we are
currently implementing (see

True click fraud is illegal clicking worth investigating by the SEC or
FBI because of potential connections with international crime,
shareholder fraud or terrorism funding. It represents a small but
potentially fast growing percentage due to the technical expertise of
these groups. From a click scoring viewpoint, extremely poor clicks
account for 10%, very poor clicks for 10%, poor clicks for 10%, and
less than average clicks for another 20% of all clicks. Correctly
identifying these click segments using an appropriate click scoring
system is of critical importance to increase ROI. Sophisticated keyword
selection systems should automatically buy dozens of thousands of
under-sold keywords and automatically set ads on Google and Yahoo,
ideally three ads per keyword. Ebay and Amazon have yet to
substantially improve they automated bidding tools though.

On the long term, advertisers will get smarter. Increased PPC with
increased fraud and thus lower ROI or even negative ROI can not be
sustained over the long term. We believe that the future will
eventually bring better fraud detection and increased ROI - possibly
with higher PPC - thanks in part to more knowledgeable advertisers and
better relevancy algorithms.

Case Studies

Examples of false positive that we were able to identify include a
large corporation, let's call it Acme, and the US Army. In the case of
Acme, an alarm was raised because of thousands of clicks per day, day
after day, by the same IP and same browser, all seemingly coming from a
same user. However the keywords associated with the clicks - both
paid and unpaid - the velocity and timing, the proportion of paid
clicks and referrals did not show unusual patterns. It was found that
Acme uses one IP and one browser for all its employees. Similarly,
after investigating a bucket of clicks with highly suspicious spoofed
IPs, it was found that the addresses were used by the US Army to hide
their true origin. This prevents potential criminals from being
indirectly informed (by checking IP addresses in their server logs)
that they are being monitored by the Army. Again, the clicks were

Conversely, we correctly identified another set of spoofed IP addresses
as fraudulent with our metric mix that incorporates proprietary keyword
categorizations and multivariate statistical distributions. Email
spammers accidentally clicking on paid clicks with web robots in their
efforts to harvest email addresses made a few mistakes: they were using
the same number of clicks per IP per day, at least on the IP addresses
that they did not share with legitimate users. In another case, our
linkage analysis revealed that thousands of IP addresses were switched
off by one distribution partner caught in click fraud. When they
reappeared, they were attached to a new partner, clearly showing that
the fraud involved clickware or adware. The fraudster knew which
computers were infected and possibly sold this information to another

Finally, We are dealing not only with counterfeit clicks, but also fake
impressions and bogus conversions. Click scoring is a complex problem:
bogus conversions involve purchases with stolen credit cards or users
paid to fill in forms and provide fake information. They can make poor
clicks look good if undetected. However, we have developed methodology
that preserves the quality of our click scoring system. Interestingly,
one of our clients was using a click fraud detection system that failed
to capture these bogus conversions in a fraud scheme, because their
previous click monitoring system relied on Javascript and clear gif.

Fraud Schemes, Clickware

Different types of undetectable attacks can be carried out against
internet companies that bill advertising clients using logfile
statistics. These attacks usually rely on IP masking, IP masquerading
and fake referrals. IP masking is accomplished by having a web robot
accessing web pages through several hundreds of anonymous proxy

In another scenario, trojans are uploaded on popular shareware sites.
Once downloaded by a user, these trojans perform the useful tasks they
are supposed to do (e.g. hard drive cleaning, virus scanning etc.) but
in addition, they randomly "click" on target links, writing fake
information in target logfiles using web robot technology.

Competing advertisers, affiliates or partners in a pay-per-click
program might want to kill each other to gain market share, using click
spam. Target links could consist of paid links associated with selected
advertising clients (e.g. perpetrator's competitors) or expensive paid
keywords (e.g. "bulk Email" or "online casino") on pay-per-click search
engines. Another version of this attack could rely on a virus with an
embedded web robot instead of a trojan. The resulting fake information
in the target logfiles can not be distinguished from legitimate clicks
from real users. The fake clicks have a 0% click-to-sale ratio, driving
the advertiser's ROI into negative territory. We have computed that it
is possible to generate $200 million in illegitimate charges with a
click spam program running non-stop over a 12 month time period on one

More recent cases involve ad relevancy fraud. It is possible to
eradicate advertisers on AdSense for popular keywords, with a
combination of bogus impressions and self-clicks, without using
fraudulent clicks.

Another scenario consists of a shareholder essentially using AOL IP
addresses and other non anonymous proxies to commit large scale fraud
on high dollar keywords on a 3rd-tier search engine, to manipulate the
stock price. Once caught, the shareholder would tell that he is the
victim of very sophisticated criminals who have spoofed his IP address
and are trying to hurt the company that he targets with click fraud.
Such a bogus claim is almost impossible to defeat in court, as true IP
spoofing really exists and makes the true (non existent, in this case)
"spoofer" essentially indistinguishable from the (self-proclaimed, in
this case) "spoofee".

A final example would be an advertiser who was banned from Google
organic search through nefarious actions committed by one of his
competitors, unable to get back into Google unpaid search results, and
then seeking revenge and retaliating against all his competitors. He
would use an expert scheme involving trending, impression and click
fraud distilled over many months. The fraud would increase very slowly
over time, making competitors' CTRs a little bit worse each month and
his own CTR better (by clicking on his own ads once in a while). Along
the same lines, one can think of a distribution partner artificially
inflating his revenues by 1% the first month, 2% the second month, etc.
with a cap set to 5%.

Our Approach: Click Scoring

While we have considerable experience both with advertiser and search
engine data, this section focuses on advertiser data.

One critical issue is how to attach a conversion to a click. We have
developed patent-pending technology that enables us to correctly
identify a unique AOL user, whether genuine, bogus or spoofed. The
algorithm even recognizes that the sale from one IP originates from a
totally different IP address. It will also detect when a sale and a
click from a same IP are actually generated by unrelated users that
share the same IP address. Or that a sale and a click from a same IP
are actually not related as the users are different but temporarily
share the same IP. In most cases, we are also able to explain the
missing clicks: click listed in Google reports but not seen in server
logs. This amounts to 50% of billed clicks in some cases. In one severe
case of missing clicks, we were able to reduce the discrepancy from 50%
to 0% and maximize savings to the client.

Quoted text here. Click to load it
viewed as a general scoring technology. The scoring system is designed
in such a way that the score distribution matches conversion rates.
Critical issues include the use of universal conversions (with
detection of bogus conversions) and standardized scores, selection of
an efficient metric mix and optimized robust metric weights generally
obtained as solution of a ridge regression problem involving
combinatorial optimization (e.g. meta-feature optimization), optimum
metric binning, tree forests or contrarian scoring technology. It is
also important to detect the (possibly site-dependent) optimum timeout
parameter in the user identification algorithm, as we can not rely on
cookies to identify users.

Site Timeline