如何使用Bash解析HTTP标头? [英] How to parse HTTP headers using Bash?

查看:92
本文介绍了如何使用Bash解析HTTP标头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从正在使用curl的网页标题中获取2个值.我已经可以使用以下方法分别获取值:

I need to get 2 values from a web page header that I am getting using curl. I have been able to get the values individually using:

response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'})

但是我无法弄清楚如何使用单个curl请求来分别grep值:

But I cannot figure out how to grep the values separately using a single curl request like:

response=$(curl -I -s http://www.example.com)
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}

每次尝试都会导致错误消息或空值.我确信这只是语法问题.

Every attempt either leads to a error message or empty values. I am sure it is just a syntax issue.

推荐答案

完整的bash解决方案.演示如何轻松解析其他标头,而无需awk:

Full bashsolution. Demonstrate how to easily parse other headers without requiring awk:

shopt -s extglob # Required to trim whitespace; see below

while IFS=':' read key value; do
    # trim whitespace in "value"
    value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}

    case "$key" in
        Server) SERVER="$value"
                ;;
        Content-Type) CT="$value"
                ;;
        HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
                ;;
     esac
done < <(curl -sI http://www.google.com)
echo $STATUS
echo $SERVER
echo $CT

制作:

302
GFE/2.0
text/html; charset=UTF-8


根据 RFC-2616 ,HTTP标头的建模方式如下所述 "ARPA Internet短信格式的标准" (RFC822),其中指出明确第3.1.2节:


According to RFC-2616, HTTP headers are modeled as described in "Standard for the Format of ARPA Internet Text Messages" (RFC822), which states clearly section 3.1.2:

字段名称必须由可打印的ASCII字符组成 (即,值在33.和126.之间的字符, 十进制,冒号除外).场体可以由任何 ASCII字符,CR或LF除外. (尽管CR和/或LF可能是 出现在实际文本中,通过以下操作将其删除 展开领域.)

The field-name must be composed of printable ASCII characters (i.e., characters that have values between 33. and 126., decimal, except colon). The field-body may be composed of any ASCII characters, except CR or LF. (While CR and/or LF may be present in the actual text, they are removed by the action of unfolding the field.)

因此,上述脚本应该捕获任何符合RFC- [2] 822的标头,但明显例外的是

So the above script should catch any RFC-[2]822 compliant header with the notable exception of folded headers.

这篇关于如何使用Bash解析HTTP标头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆