|
Posted by The Bicycling Guitarist on October 28, 2004, 7:26 am
Please log in for more thread options
My web site has not been spidered by Googlebot since April 2003. The site in
question is at www.TheBicyclingGuitarist.net/ I received much help from this
NG and the stylesheets NG when updating the code before then.
My host's tech guy just sent me the following. Isn't it okay to specify
UTF-8 as the charset in the HTTP headers at the server level? Isn't it okay
to have validated XHTML 1.0 strict code?
*************************************************************
If it was an misconfiguration with IIS the problem would be presenting
itself for every site that is hosted on that server under that instance of
IIS which isn't the case here. I have only been able to find two
differences between your site which Google isn't updated, and the sites that
are.
1) You have a custom charset also specified in the HTTP headers at the
server level
2) You are using XHTML strict.
I am curious why you chose XHTML strict rather than traditional? Here's a
quote from broadbandreports.com with the full link at
http://www.broadbandreports.com/faq/webmonks?text=1)http://www.broadbandreports.com/faq/webmonks?text=1
"If you are using XHTML you should strive to make your pages validate as
XHTML 1.0 Transitional. The XHTML 1.0 Strict standard is a bit too confining
for real world web sites."
My suggestion is still that you talk to Google to find out why their bot
both is getting a 406 error, and why it isn't updating the content it isn't
getting an error on. If you would like I would be happy to reset the HTTP
headers to the default setting so your site identically matches every other
site hosted on this server as far as IIS goes.
----- Original Message -----
From: Chris Watson
To: XXXXXXXXXXXXX
Sent: Tuesday, October 26, 2004 5:16 PM
Subject: RE: Jesse, you really need to see this.
I posted the information from your last two emails in one message at a
search engines newsgroup. What about this guy's answer? It's short and
sweet.
The Bicycling Guitarist wrote:
> The following are two messages from the tech guy at my host concerning my
> problems with Googlebot or vice versa.
The problem seems to be with your IIS configuration. Google sends
Accept: text/html,text/plain; which of course makes good sense for a
robot as it doesn't want anything else. Your IIS appears to be
incorrectly configured to send a 406 not acceptable message when it sees
this.
If you accept text/* you get your page. It doesn't seem to be linked to
the charset.
|
|
Posted by Neal on October 28, 2004, 3:31 am
Please log in for more thread options
On Thu, 28 Oct 2004 06:26:27 GMT, The Bicycling Guitarist
> My web site has not been spidered by Googlebot since April 2003. The
> site in
> question is at www.TheBicyclingGuitarist.net/ I received much help from
> this
> NG and the stylesheets NG when updating the code before then.
>
> My host's tech guy just sent me the following. Isn't it okay to specify
> UTF-8 as the charset in the HTTP headers at the server level? Isn't it
> okay
> to have validated XHTML 1.0 strict code?
I replied in alt.html - yep, UTF-8 and XHTML (served as text/html) has put
my site at PR4 and is on top for my keywords. The problem must lie
elsewhere.
|
|
Posted by Leif K-Brooks on October 28, 2004, 3:29 pm
Please log in for more thread options The Bicycling Guitarist wrote:
> My web site has not been spidered by Googlebot since April 2003. The site in
> question is at www.TheBicyclingGuitarist.net/ I received much help from this
> NG and the stylesheets NG when updating the code before then.
Seems to be sending the incorrect MIME type "text/*" instead of text/html:
[leif@localhost leif]$ telnet TheBicyclingGuitarist.net 80
Trying 216.229.101.149...
Connected to TheBicyclingGuitarist.net (216.229.101.149).
Escape character is '^]'.
GET / HTTP/1.1
Host: TheBicyclingGuitarist.net
HTTP/1.1 200 OK
Server: Microsoft-IIS/5.0
Content-Location: http://TheBicyclingGuitarist.net/index.htm Date: Thu, 28 Oct 2004 18:24:11 GMT
Content-Type: text/*;charset=utf-8
Accept-Ranges: bytes
Last-Modified: Wed, 27 Oct 2004 19:48:04 GMT
ETag: "fab99adf5dbcc41:9f9"
Content-Length: 5169
<snip>
|
|
Posted by The Bicycling Guitarist on October 28, 2004, 9:48 pm
Please log in for more thread options
> The Bicycling Guitarist wrote:
>> My web site has not been spidered by Googlebot since April 2003. The site
>> in question is at www.TheBicyclingGuitarist.net/ I > Seems to be sending
>> the incorrect MIME type "text/*" instead of text/html:
>
> [leif@localhost leif]$ telnet TheBicyclingGuitarist.net 80
> Trying 216.229.101.149...
> Connected to TheBicyclingGuitarist.net (216.229.101.149).
> Escape character is '^]'.
> GET / HTTP/1.1
> Host: TheBicyclingGuitarist.net
>
> HTTP/1.1 200 OK
> Server: Microsoft-IIS/5.0
> Content-Location: http://TheBicyclingGuitarist.net/index.htm > Date: Thu, 28 Oct 2004 18:24:11 GMT
> Content-Type: text/*;charset=utf-8
> Accept-Ranges: bytes
> Last-Modified: Wed, 27 Oct 2004 19:48:04 GMT
> ETag: "fab99adf5dbcc41:9f9"
> Content-Length: 5169
>
Oh no. I just asked the tech to change from text/html to text/* on the
advice of someone in another NG. I am sorry about posting the same question
to two similar NG's. I was told "If you accept text/* you get your page. It
doesn't seem to be linked to the charset."
The problem existed for a year and a half using "text/html". The change to
"text/*" just happened today or yesterday. Should the tech change it back?
How do I get these two NG threads back together?
Chris Watson a.k.a. "The Bicycling Guitarist"
|
|
Posted by Leif K-Brooks on October 28, 2004, 6:26 pm
Please log in for more thread options The Bicycling Guitarist wrote:
> Oh no. I just asked the tech to change from text/html to text/* on the
> advice of someone in another NG. I am sorry about posting the same question
> to two similar NG's. I was told "If you accept text/* you get your page. It
> doesn't seem to be linked to the charset."
He was talking about the accept header that the user agent (e.g.
browser) sends to the server, not the content-type that your server
changes to the browser. Wildcarding is acceptable for accept headers,
but for content-type headers, it's ridiculous.
By the way, your broken content-type header makes the site not work in
working browsers (like Mozilla).
|
| Similar Threads | Posted | | Validating UTF8 encoding ... | November 4, 2005, 8:11 am |
| xhtml vs html 4 strict | May 21, 2005, 5:23 pm |
| Strict XHTML and div question | July 14, 2005, 7:07 pm |
| XHTML 1.0 Strict and the Apostrophe | February 15, 2008, 11:12 am |
| What to use intead of taget_new in XHTML/Strict | July 19, 2004, 7:41 am |
| image maps + xhtml strict | October 23, 2004, 9:16 pm |
| Valid XHTML strict messed up in IE Mac | December 20, 2004, 7:13 pm |
| XHTML 1.0 Strict validation problem | November 11, 2005, 10:30 am |
| HTML 4.01 strict / transitional vs. XHTML 1.0 | September 18, 2005, 3:10 pm |
| ampersand in urls when using xhtml 1.0 strict | December 17, 2007, 8:30 am |
|