funny (strange behavior)link

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
I'm using a script to download intel architecture specs every couple of months
so I'll always have the current docs.
Go here: /
scroll down to the bottom of the page, the link for
"Intel® 64 and IA-32 Architectures Optimization Reference Manual"
points to " "
which i can download by clicking on the link, but in my script using wget,
i always get a 403 error - why?

wget -O
Connecting to||:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
10:10:04 ERROR 403: Forbidden.

Re: funny (strange behavior)link

Eric wrote:
Quoted text here. Click to load it

I'm guessing that they filter out requests from automated or automatable
tools like wget, googlebot, linkchecker, and so on to conserve on server
load and bandwidth.

Re: funny (strange behavior)link

Harlan Messinger wrote:

Quoted text here. Click to load it

I wonder how they distinguish wget from a real browser, or if I can get around

Re: funny (strange behavior)link

On 2008-07-15, Eric wrote:
Quoted text here. Click to load it

   You can change the user agent string:

       -U agent-string
           Identify as agent-string to the HTTP server.

           The HTTP protocol allows the clients to identify themselves using a
           "User-Agent" header field.  This enables distinguishing the WWW
           software, usually for statistical purposes or for tracing of proto-
           col violations.  Wget normally identifies as Wget/version, version
           being the current version number of Wget.

           However, some sites have been known to impose the policy of tailor-
           ing the output according to the "User-Agent"-supplied information.
           While this is not such a bad idea in theory, it has been abused by
           servers denying information to clients other than (historically)
           Netscape or, more frequently, Microsoft Internet Explorer.  This
           option allows you to change the "User-Agent" line issued by Wget.
           Use of this option is discouraged, unless you really know what you
           are doing.

           Specifying empty user agent with --user-agent="" instructs Wget not
           to send the "User-Agent" header in HTTP requests.

   Chris F.A. Johnson, webmaster         <
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

Re: funny (strange behavior)link

Quoted text here. Click to load it

Yes, they can see from the user-agent that it is not
a browser and block wget requests or other
type of HTTP requests that might come from bots.
did you try to contact the website, in case
they have APIs or RSS feeds that can be
downloaded with wget or by other automated
HTTP requests?

Site Timeline