Comment déterminer le statut HTTP sans télécharger la page complète?

Question

Je souhaite connaître l'état HTTP des sites Web utilisant Ubuntu. J'ai utilisé les commandes curl et wget à cette fin. Mais le problème est que ces commandes téléchargent la page complète du site Web, puis recherchent l’en-tête et l’affichent à l’écran. Par exemple:

$ curl -I trafficinviter.com HTTP/1.1 200 OK Date: Mon, 02 Jan 2017 14:13:14 GMT Server: Apache X-Pingback: http://trafficinviter.com/xmlrpc.php Link: <http://trafficinviter.com/>; rel=shortlink Set-Cookie: wpfront-notification-bar-landingpage=1 Content-Type: text/html; charset=UTF-8

La même chose se produit avec la commande Wget où la page complète est téléchargée et utilise inutilement ma bande passante.

Ce que je recherche, c’est: comment obtenir le code de statut HTTP sans télécharger aucune page afin de pouvoir économiser ma consommation de bande passante. J'avais essayé d'utiliser curl mais je ne suis pas sûr de télécharger une page complète ou juste un en-tête sur mon système pour obtenir le code d'état.

AlexP · Accepted Answer

curl -I extrait uniquement les en-têtes HTTP; il ne télécharge pas la page entière. De man curl :

-I, --head (HTTP/FTP/FILE) Fetch the HTTP-header only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on an FTP or FILE file, curl displays the file size and last modification time only.

Une autre option consiste à installer lynx et à utiliser lynx -head -dump.

La demande HEAD est spécifiée par le protocole HTTP 1.1 ( RFC 2616 ):

9.4 HEAD The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.

muru · Answer

Avec wget, vous devez utiliser l'option --spider pour envoyer une demande HEAD comme curl:

$ wget -S --spider https://google.com Spider mode enabled. Check if remote file exists. --2017-01-03 00:08:38-- https://google.com/ Resolving google.com (google.com)... 216.58.197.174 Connecting to google.com (google.com)|216.58.197.174|:443... connected. HTTP request sent, awaiting response... HTTP/1.1 302 Found Cache-Control: private Content-Type: text/html; charset=UTF-8 Location: https://www.google.co.jp/?gfe_rd=cr&ei=... Content-Length: 262 Date: Mon, 02 Jan 2017 15:08:38 GMT Alt-Svc: quic=":443"; ma=2592000; v="35,34" Location: https://www.google.co.jp/?gfe_rd=cr&ei=... [following] Spider mode enabled. Check if remote file exists. --2017-01-03 00:08:38-- https://www.google.co.jp/?gfe_rd=cr&ei=... Resolving www.google.co.jp (www.google.co.jp)... 210.139.253.109, 210.139.253.93, 210.139.253.123, ... Connecting to www.google.co.jp (www.google.co.jp)|210.139.253.109|:443... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Date: Mon, 02 Jan 2017 15:08:38 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=Shift_JIS P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info." Server: gws X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Set-Cookie: NID=...; expires=Tue, 04-Jul-2017 15:08:38 GMT; path=/; domain=.google.co.jp; HttpOnly Alt-Svc: quic=":443"; ma=2592000; v="35,34" Transfer-Encoding: chunked Accept-Ranges: none Vary: Accept-Encoding Length: unspecified [text/html] Remote file exists and could contain further links, but recursion is disabled -- not retrieving.