Google stopped indexing my wikipedia mirror

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I recently integrated wikipedia with my site, using two
approaches. One is linking individual wiki pages into my algebra
modules. The links in those pages point to the real wikipedia, but
javascript in them wuold direct the reader who clicks on them, to my
site. This lets users who click on these links, to stay within one
algebra module. I am not concerned about that case.

The second is that I have a full crosslinked wikipedia mirror under
one particular directory. I already get quite a few google directed
hits to various pages there. However, I keep track of how many
wikipedia pages googlebot is visiting, and it has not visited even a
fraction of what is out there.

At some point I fed google several big files with links to all
articles, which it promptly read and even followed some (I think). At
some point later, the visits stopped. The pages that google did read,
are still visitable through search engines.

I am talking about tens or hundreds of thousands of articles. Google
indexed mere thousands.

Full credit is given to wikipedia and I fully follow the GFDL license.

My question is, is there something that prevents google from following
up on this. Any ideas will be appreciated. The pages with links
contain 5,000 links each, there are 289 such pages and the master list.



Re: Google stopped indexing my wikipedia mirror

Ignoramus29781 wrote:
Quoted text here. Click to load it

uh huh, along with a bizzilion other folk. The problem is that Google
does not really want to bother with all these Wikipedia mirrors so runs
duplicate page algorithms. Maybe the indexer decided that your pages
were just duplicates of other content and told the googlebot not to
spider those links anymore. That would make sense to me as spidering and
indexing pages that are of no benefit to searchers just wastes Google's

Re: Google stopped indexing my wikipedia mirror

Quoted text here. Click to load it

Surely, that makes sense. Possibly, it will happen sooner or later.

I began to mirror individual wikipedia pages for math related content,
to complement the math pages that I already had. Then I decided to
mirror wikipedia in a SE friendly way, since all pieces were already
in place.

In any case, googlebot is back and is busy indexing my pages. It
varies by day.

I fully comply with the wikipedia license, giving credit, referring to
GFDL etc.


Site Timeline