[index] [prev] [next] [options] [help]

eprints_tech messages

[EP-tech] Linkcheck: HEAD method ends up in 404

From: Martin Braendle via Eprints-tech <eprints-tech AT ecs.soton.ac.uk>
Date: Tue, 14 Jul 2020 14:08:40 +0200





Hi out there

we're working on a linkchecker to remove all gone official and related
links in our Repo. Some of the URLs return to our own Repo and lickchecker
gets an ugly 404 although the publications exist.

So, what we're doing is some LWP::UserAgent  stuff, a simple get HEAD of
the URL an then analyze the response. If there was a '$status_code ==
HTTP_METHOD_NOT_ALLOWED' we would try a GET and all together we're doing
some delay/retry/timeout handling. But in the end we allways catch a
404 :-(

Additional information
- We use a 404 handler
- We're allowed to use Get, Put, Trace, Options - all fine, only HEAD
method results in a 404 ?!?
- We use the redirect from 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=n9ElsxDkD4%2BJtQzZDAIzJtT%2B7xL9I2rUYbfcoxxKwv0%3D&amp;reserved=0 =>
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=4Byz2PKpYTxxi125LBRLH7PFth5zjmAq%2Fsu2exxbsv0%3D&amp;reserved=0 and it only seems to concern this
dynamic type of content; static pages work fine.

Let's show some examples via CURL:

[zora]$ curl -i -X HEAD -L 
"https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=n9ElsxDkD4%2BJtQzZDAIzJtT%2B7xL9I2rUYbfcoxxKwv0%3D&amp;reserved=0" (
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F1&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=n9ElsxDkD4%2BJtQzZDAIzJtT%2B7xL9I2rUYbfcoxxKwv0%3D&amp;reserved=0')
HTTP/1.1 303 See Other
Date: Tue, 14 Jul 2020 11:49:08 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
mod_perl/2.0.11 Perl/v5.16.3
Location: /id/eprint/1

HTTP/1.1 303 See Other
Date: Tue, 14 Jul 2020 11:49:13 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
mod_perl/2.0.11 Perl/v5.16.3
Allow: GET,HEAD,PUT,OPTIONS
Location: 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fid%2Feprint%2F1%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=4Byz2PKpYTxxi125LBRLH7PFth5zjmAq%2Fsu2exxbsv0%3D&amp;reserved=0
Strict-Transport-Security: max-age=15780000

HTTP/1.1 404 Not Found
Date: Tue, 14 Jul 2020 11:49:18 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
mod_perl/2.0.11 Perl/v5.16.3
Cache-Control: no-store, no-cache, must-revalidate
Strict-Transport-Security: max-age=15780000
Content-Type: text/html; charset=utf-8



[zora]$ curl -i -X HEAD -L 
"https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=YvJ2dJeRzFiAEiH%2FhgG0LOvAFMOwVSf%2BOKYoBTUpOj0%3D&amp;reserved=0" (
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=YvJ2dJeRzFiAEiH%2FhgG0LOvAFMOwVSf%2BOKYoBTUpOj0%3D&amp;reserved=0')
HTTP/1.1 200 OK
Date: Tue, 14 Jul 2020 11:49:31 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
mod_perl/2.0.11 Perl/v5.16.3
Expires: Thu, 13 Aug 2020 11:49:31 GMT
Cache-Control: no-store, no-cache, must-revalidate
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15780000
Content-Type: text/html; charset=utf-8



[zora]$ curl -i -X HEAD -L 
"https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=5OMtUfT0RYE0JoU6XgBli1tNqqa%2FdVaL8Mn5BObOm60%3D&amp;reserved=0" (
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.zora.uzh.ch%2Fhelp%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cc97109f153f74b07ee9d08d827eea8c1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=5OMtUfT0RYE0JoU6XgBli1tNqqa%2FdVaL8Mn5BObOm60%3D&amp;reserved=0')
HTTP/1.1 200 OK
Date: Tue, 14 Jul 2020 11:49:53 GMT
Server: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips
mod_perl/2.0.11 Perl/v5.16.3
Expires: Thu, 13 Aug 2020 11:49:53 GMT
Cache-Control: no-store, no-cache, must-revalidate
Vary: Accept-Encoding
Strict-Transport-Security: max-age=15780000
Content-Type: text/html; charset=utf-8


Does anybody has any suggestion, solution, hint?

Kind gerads from Zürich
 Martin & Jens

ATTACHMENT: message.html!

*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/

[index] [prev] [next] [options] [help]