一个关于wget和curl的Host小问题

背景:

测试一个新浪的视频资源时,发现了wget和curl的返回结果不同,如下:

wget -S --header="Host: edge.ivideo.sina.com.cn" "http://edge.ivideo.sina.com.cn.wscdns.com/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2" -O /dev/null
--2016-04-05 11:11:26--  http://edge.ivideo.sina.com.cn.wscdns.com/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2
Resolving edge.ivideo.sina.com.cn.wscdns.com... 124.165.216.169
Connecting to edge.ivideo.sina.com.cn.wscdns.com|124.165.216.169|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 302 Found
  Cache-Control: no-cache
  Connection: close
  Location: http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032cde&wsid_tag=3d8798cb&wsiphost=ipdbm
Location: http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032cde&wsid_tag=3d8798cb&wsiphost=ipdbm [following]
--2016-04-05 11:11:26--  http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032cde&wsid_tag=3d8798cb&wsiphost=ipdbm
Connecting to 61.163.117.81:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 403 Forbidden
  Server: Cdn Cache Server V2.0
  Date: Tue, 05 Apr 2016 03:11:27 GMT
  Content-Type: text/html
  Content-Length: 1432
  Expires: Tue, 05 Apr 2016 03:11:27 GMT
  X-Cache-Error: ERR_ACCESS_DENIED 0
  Via: 1.0 yuanwangtong81:5706 (Cdn Cache Server V2.0)
  Connection: close
2016-04-05 11:11:26 ERROR 403: Forbidden.

curl -sv -L -H "Host: edge.ivideo.sina.com.cn"  "http://edge.ivideo.sina.com.cn.wscdns.com/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2" -o /dev/null
*   Trying 124.165.216.169...
* Connected to edge.ivideo.sina.com.cn.wscdns.com (124.165.216.169) port 80 (#0)
> GET /10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2 HTTP/1.1
> Host: edge.ivideo.sina.com.cn
> User-Agent: curl/7.43.0
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 302 Found
< Cache-Control: no-cache
< Connection: close
< Location: http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032d2d&wsid_tag=3d8798cb&wsiphost=ipdbm
<
{ [0 bytes data]
* Closing connection 0
* Issue another request to this URL: 'http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032d2d&wsid_tag=3d8798cb&wsiphost=ipdbm'
*   Trying 61.163.117.81...
* Connected to 61.163.117.81 (61.163.117.81) port 80 (#1)
> GET /edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032d2d&wsid_tag=3d8798cb&wsiphost=ipdbm HTTP/1.0
> Host: 61.163.117.81
> User-Agent: curl/7.43.0
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Tue, 05 Apr 2016 02:53:35 GMT
< Server: nginx/1.6.0 r16033116-31dfdda
< X-RequestId: 00b15cd7-1604-0200-0335-d4ae52a774d9
< X-Requester: SINA00000000000VIASK
< Last-Modified: Wed, 15 Aug 2012 19:03:46 GMT
< X-Filesize: 6228047
< ETag: "5f175786197b511a4ba8f6167ad19632"
< Cache-Control: max-age=31536000
< Access-Control-Allow-Headers: Origin, Content-Type, Accept, Content-Length
< Access-Control-Allow-Methods: GET, PUT, POST, DELETE, OPTIONS, HEAD
< Access-Control-Max-Age: 31536000
< Access-Control-Allow-Origin: *
< Content-Length: 6228047
< Age: 1150
< Via: 1.0 dongwangtong61:88 (Cdn Cache Server V2.0), 1.0 yuanwangtong81:8500 (Cdn Cache Server V2.0)
< Connection: close
<
{ [758 bytes data]
* Closing connection 1

从上面可以看到wget和curl都是跟踪302后的资源,然后再去访问,为什么wget -S –header=”Host: edge.ivideo.sina.com.cn”返回403,而curl -sv -L -H “Host: edge.ivideo.sina.com.cn”却正常返回200呢?我猜测可能是这两种方式向服务端发送的Header会有哪个地方不一样,于是,查了下手册,希望可以找到些什么:

curl:
-L, --location
       (HTTP/HTTPS) If the server reports that the requested page  has  moved  to  a  different  location
       (indicated  with  a Location: header and a 3XX response code), this option will make curl redo the
       request on the new place. If used together with -i, --include or  -I,  --head,  headers  from  all
       requested pages will be shown. When authentication is used, curl only sends its credentials to the
       initial host. If a redirect takes curl to a different host, it won't  be  able  to  intercept  the
       user+password.  See  also  --location-trusted  on  how to change this. You can limit the amount of
       redirects to follow by using the --max-redirs option.

       When curl follows a redirect and the request is not a plain GET (for example POST or PUT), it will
       do  the  following  request  with a GET if the HTTP response was 301, 302, or 303. If the response
       code was any other 3xx code, curl will re-send the following request  using  the  same  unmodified
       method.

wget:
-S
--server-response
    Print the headers sent by HTTP servers and responses sent by FTP servers.

很遗憾,看了一会man手册,并没有发现什么重要的东西,于是只好抓包看看了。

使用wireshark可以清楚的看到当使用wget -S –header=”Host: edge.ivideo.sina.com.cn时,本机第一次访问调度机124.165.216.169会带着Host头edge.ivideo.sina.com.cn,返回302后,本机继续带着Host头edge.ivideo.sina.com.cn跟124.165.204.34访问,如下:

sudo tcpdump -s 0 host 10.209.70.69 and (124.165.216.169 or 124.165.204.34) -w ~/Documents/wget.pcap

而当使用curl -sv -L -H “Host: edge.ivideo.sina.com.cn时,本机第一次访问调度机124.165.216.169会带着Host头edge.ivideo.sina.com.cn,返回302后,向主机61.163.117.81发送的Host变成了61.163.117.81,如下:

sudo tcpdump -s 0 host 10.209.70.69 and (124.165.216.169 or 61.163.117.81) -w ~/Documents/curl.pcap

于是猜测网宿的调度302返回后的资源不支持带着Host去访问,验证猜测是对的,如下:

wget -S "http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032929&wsid_tag=3d8798cb&wsiphost=ipdbm" -O /dev/null
--2016-04-05 11:00:01--  http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032929&wsid_tag=3d8798cb&wsiphost=ipdbm
Connecting to 61.163.117.81:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 200 OK
  Date: Tue, 05 Apr 2016 02:53:35 GMT
  Server: nginx/1.6.0 r16033116-31dfdda
  X-RequestId: 00b15cd7-1604-0200-0335-d4ae52a774d9
  X-Requester: SINA00000000000VIASK
  Last-Modified: Wed, 15 Aug 2012 19:03:46 GMT
  X-Filesize: 6228047
  ETag: "5f175786197b511a4ba8f6167ad19632"
  Cache-Control: max-age=31536000
  Access-Control-Allow-Headers: Origin, Content-Type, Accept, Content-Length
  Access-Control-Allow-Methods: GET, PUT, POST, DELETE, OPTIONS, HEAD
  Access-Control-Max-Age: 31536000
  Access-Control-Allow-Origin: *
  Content-Length: 6228047
  Age: 386
  Via: 1.0 dongwangtong61:88 (Cdn Cache Server V2.0), 1.0 yuanwangtong81:8500 (Cdn Cache Server V2.0)
  Connection: keep-alive
Length: 6228047 (5.9M)
Saving to: '/dev/null'

100%[==================================================================================================================================================================================================>] 6,228,047   1.13MB/s   in 5.1s

2016-04-05 11:00:06 (1.17 MB/s) - '/dev/null' saved [6228047/6228047]

wget -S --header="Host: edge.ivideo.sina.com.cn" "http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032929&wsid_tag=3d8798cb&wsiphost=ipdbm" -O /dev/null
--2016-04-05 11:00:36--  http://61.163.117.81/edge.ivideo.sina.com.cn/10000.flv?KID=sina,viask&Expires=1459958400&ssig=KPhNWs8B7m&corp=2&wshc_tag=1&wsts_tag=57032929&wsid_tag=3d8798cb&wsiphost=ipdbm
Connecting to 61.163.117.81:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 403 Forbidden
  Server: Cdn Cache Server V2.0
  Date: Tue, 05 Apr 2016 03:00:41 GMT
  Content-Type: text/html
  Content-Length: 1432
  Expires: Tue, 05 Apr 2016 03:00:41 GMT
  X-Cache-Error: ERR_ACCESS_DENIED 0
  Via: 1.0 yuanwangtong81:5706 (Cdn Cache Server V2.0)
  Connection: close
2016-04-05 11:00:41 ERROR 403: Forbidden.