Bash:从HTTP响应中删除标题 [英] Bash: Remove headers from HTTP response

查看:128
本文介绍了Bash:从HTTP响应中删除标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一些包含HTTP标头和正文的文本,例如:

  HTTP / 1.1 200 OK 
Cache -Control:public,max-age = 38
Content-Type:text / html; charset = utf-8
过期时间:2013年11月22日星期五06:15:01 GMT
上次修改时间:2013年11月22日星期五,星期五06:14:01 GMT
变化:*
X-Frame-Options:SAMEORIGIN
日期:2013年11月22日星期五06:14:22 GMT

<!DOCTYPE html>
< html>
< head>
< title>我的网站< / title>
< / head>
< body>

Hello world!

< / body>
< / html>

这个文本是从一个命令传入的,我怎样才能删除这个头文件只留下body?



(在标题中, \r\\\
用作换行符。 \r\\\
\r\\\
标记标题的结尾和主体的开始。)



< ( ... 表示任何命令,例如 cat curl 它会输出一些HTTP头文件和body到stdout):
$ b $ h2 sed

我的第一个想法是用 sed 替换,在第一次出现 \r\\\
\r\之前删除所有内容n

  ... | sed's | ^。*?\r\\\
\r\\\
||'



<但这不起作用,主要是因为 sed 只能在个别行上运行,所以它不能在 \r \\\
。 (另外,它不支持非贪婪操作符。)



grep



我还考虑过对 \r\\\
\r使用 grep \\\

  ... | grep -oP'(?<= \ r \ n \r\\\
)。*'

但是这也行不通(主要是因为 grep 只能在个别行上运行)。



pcregrep 有多行模式( -M ),但 pcregrep 常常不可用(它在Ubuntu 12.04,Mac OS X 10.7等中默认没有安装),我想要一个不需要任何非标准工具的解决方案。



perl



然后我想到用 perl 进行替换,使用<$ c $

  ... | perl -pe's /^.*?\r\\\
\r\\\
// s'

我认为这更接近工作解决方案。不过,我认为默认情况下,Perl的输入记录分隔符( $ / )是 \\\
,需要更改至 \r\\\
,这样可以匹配 \r\ ñ -0 选项可用于将 $ / 设置为单个字符,但不能设置多个字符。我试过这个,但我不认为它是正确的:

  ... | perl -pe'$ / =\r\\\
; s /^.*?\r\\\
\r\\\
// s'

另外,我认为 ^ 匹配start of line,但需要匹配start of file。

< h2>偏移和子字符串

我有一个想法获得 \r\\\
\r\\\
using:

  BodyOffset = $(expr index$ MyHttpText\r\\\
\r \\\

然后使用以下内容将主体提取为子字符串:

  HttpBody = $ {MyHttpText:BodyOffset} 

不幸的是, expr 的Mac OS X版本不支持 index 。另外,如果可能的话,我想要一个不需要创建变量的解决方案。



参数替换



另一个想法是使用参数替换,其中表示从 $ MyHttpText 中删除​​最短属于 * \r\\\
\r\\\
的部分,它与 $ MyHttpText 的前端匹配 :

  HttpBody = $ {MyHttpText#* \r\\\
\r\\\
}

但我不确定如何在管道命令中使用它,并且我更喜欢一个解决方案不需要变量。

解决方案

sed 可以做到这一点:

  sed '1,/ ^ $ / d'data.txt 

该命令删除从第1行开始的所有内容,并在第一次出现emp时结束ty行( ^ $ )。如果你有 \\\
作为换行符,这是有效的。如果您有 \r\\\
作为换行符,您可以使用 dos2unix unix2dos 将它们来回转换,或者您可以将 \r 字符添加到正则表达式:

  sed'1,/ ^ \ $ $ / d'data.txt 

但是,最后一行只有在将 \r\\\
作为换行符时才能使用,以使其适用于这两种换行符,您可以使用:

  sed'1,/ ^ \r\ {0,1 \} $ / d'data.txt 

在这里,我们正在寻找一个空行或者0或1 \r 字符。


If I have some text containing HTTP headers and body, eg:

HTTP/1.1 200 OK
Cache-Control: public, max-age=38
Content-Type: text/html; charset=utf-8
Expires: Fri, 22 Nov 2013 06:15:01 GMT
Last-Modified: Fri, 22 Nov 2013 06:14:01 GMT
Vary: *
X-Frame-Options: SAMEORIGIN
Date: Fri, 22 Nov 2013 06:14:22 GMT

<!DOCTYPE html>
<html>
<head>
    <title>My website</title>
</head>
<body>

Hello world!

</body>
</html>

and this text is being piped in from a command, how can I remove the headers to leave just the body?

(Within the headers, \r\n is used as the line break.  \r\n\r\n marks the end of the headers and the start of the body.)

Here's what I've tried (... indicates any command such as cat or curl which will output some HTTP headers and body to stdout):

sed

My first idea was to do substitution with sed, to remove everything before the first occurrence of \r\n\r\n:

... | sed 's|^.*?\r\n\r\n||'

But this doesn't work, mainly because sed only operates on individual lines, so it can't operate on \r or \n.  (In addition, it doesn't support the ? non-greedy operator.)

grep

I also thought of using grep with a positive lookbehind for \r\n\r\n:

... | grep -oP '(?<=\r\n\r\n).*'

But this doesn't work either (mainly because grep only operates on individual lines).

pcregrep has a multiline mode (-M), but pcregrep is often not available (it's not installed by default in Ubuntu 12.04, Mac OS X 10.7, etc), and I'd like a solution which doesn't require any non-standard tools.

perl

I then thought of doing substitution with perl, using the /s modifier so that . matches line breaks:

... | perl -pe 's/^.*?\r\n\r\n//s'

I think this is closer to a working solution.  However, I think Perl's Input Record Separator ($/) is \n by default, and needs to be changed to \r\n, so that . can match \r\n.  The -0 option can be used to set $/ to a single character, but not multiple characters.  I've tried this, but I don't think it's correct:

... | perl -pe '$/ = "\r\n"; s/^.*?\r\n\r\n//s'

Also, I think ^ is matching "start of line", but needs to match "start of file".

Offset and substring

I had an idea of getting the offset of \r\n\r\n using:

BodyOffset=$(expr index "$MyHttpText" "\r\n\r\n")

and then extracting the body as a substring using:

HttpBody=${MyHttpText:BodyOffset}

Unfortunately, the Mac OS X version of expr doesn't support index.  Also, if possible, I'd like a solution which doesn't require the creation of variables.

Parameter substitution

One other idea I had was to use parameter substitution, where # means "Remove from $MyHttpText the shortest part of *\r\n\r\n that matches the front end of $MyHttpText":

HttpBody=${MyHttpText#*\r\n\r\n}

But I'm not sure how to use this in a piped sequence of commands, and again I'd prefer a solution which doesn't require variables.

解决方案

can do this:

sed '1,/^$/d' data.txt

This command deletes everything starting from line 1, and ending at the first occurrence of an empty line (^$). This works if you have \n as a newline character. If you have \r\n as a newline character, you can use dos2unix and unix2dos to convert them back and forth or you can add the \r character to the regex:

sed '1,/^\r$/d' data.txt

However, the last line will only work if you have \r\n as a newline character, to make it work on both types of newlines, you can use:

sed '1,/^\r\{0,1\}$/d' data.txt

Here we are looking for an empty line with either 0 or 1 \r characters.

这篇关于Bash:从HTTP响应中删除标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆