如何使用正则表达式解析SIP消息的多行头？ [英] How to parse multi-line headers of SIP message using regex?

查看：283 发布时间：2016/8/12 18:42:36 c++ regex boost sip

本文介绍了如何使用正则表达式解析SIP消息的多行头？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图提取标记从来自：一个SIP消息的标题

我的正则表达式： ^（来源：| F：）（（\\\\ \\\\ñW）？！）* [] *标记[] * = [] *（[[： alnum：]] *）

RFC 3261允许多行头。这个新行应该以一个空格开始。可是我不得不与多行头的一个问题。如果标记是一个新行，我正则表达式是行不通的。

例正确的SIP消息：

 邀请SIP：13@10.10.1.13 SIP / 2.0
途经：SIP / 2.0 / UDP 10.10.1.99:5060;branch=z9hG4bK343bf628;rport
联系方式：其中，SIP：15@10.10.1.99>
呼叫ID：326371826c80e17e6cf6c29861eb2933@10.10.1.99
这个Cseq：102 INVITE
用户代理：的Asterisk PBX
马克斯 - 前锋：70
日期：周三2009 GMT 14时12分45秒12月6日
允许：INVITE，ACK，CANCEL，OPTIONS，BYE，REFER，SUBSCRIBE，NOTIFY
支持单位：替换
内容类型：应用/ SDP
内容长度：258
来源：测试15＆LT; SIP：15@10.10.1.99>
 ;标签= fromtag
要：其中，SIP：13@10.10.1.13> ;;标签= totagV = 0
O =根1821 1821 IN IP4 10.10.1.99
S =会话
c = IN IP4 10.10.1.99
T = 0 0
M =音频11424 RTP / AVP 0 8 101
一个= rtpmap：0 PCMU / 8000
一个= rtpmap：8 PCMA / 8000
一个= rtpmap：101电话事件/ 8000
A = FMTP：101 0-16
A = silenceSupp：关闭 -   -   -   - 
A =分组时间：20
A = SENDRECV

我怎样才能正确分析多行头？先谢谢了。

解决方案

我第二次使用运动/生成正确的解析器。

有什么在一个单独的步骤解析头阻止你，但你仍然可以指定声明语法，这是主要的点。

在这里最好的部分确实

声明风格使其更容易与更多的扩展语法（周围位或更多的细节，如不允许CTL字符）

自由的调试工具（的#define BOOST_SPIRIT_DEBUG ，完成）

下面是一个简单的作为在多行头语法：

2616

报头字段可以扩展多行由preceding每多行至少有一个SP或HT

RFC 822

 字段=字段名：[字段体] CRLF 字段名= 1 * LT;任何字符，不包括的CTL，空间和：＆GT; 场体=场体内容
                [CRLF LWSP-CHAR场体] 现场体内容=
               ＆lt;保证ASCII字符组成的场体，如
                在下面的章节中定义，以及由
                原子的组合，带引号的字符串，
                特别优惠令牌，否则由文本制造＆gt;

因此，事不宜迟，这里有一个简单的文法大致是，从输入迭代器中的任何范围解析成一个std ::地图

 使用标题=的std ::地图＆LT;的std ::字符串，性病::字符串取代;

下面是解析器的核心是：

 汽车及放大器; CRLF =\\ r \\ n;
    汽车及放大器; tspecials =\\ T＆GT;＆LT; @;：\\\\\\/] [=} {：？    治＆LT;它，标准::字符串（）＆GT;令牌，价值;    令牌= +〜char_（tspecials）; // 整我？应该过滤的CTL
    值= *（char_  - （CRLF＆GT;＆GT;及（〜空白| EOI）））;    头头;
    布尔OK = phrase_parse（第一，最后，（令牌GT;＆GT;'：'＆GT;＆GT;值）％CRLF＆GT;＆GT;省略[*亮（CRLF），空白头）;#IFDEF DEBUG
    如果（OK）的std :: CERR＆LT;＆LT; DEBUG：解析成功\\ n;
    其他的std :: CERR＆LT;＆LT; DEBUG：解析失败\\ n;
    如果（第一=最后！）的std :: CERR＆LT;＆LT; DEBUG：剩余未解析输入：'＆LT;＆LT;标准::字符串（第一，最后）LT;＆LT; '\\ n;
＃万一

您可以看到现场演示从你的问题解析样品头：

<大骨节病> 住在Coliru

印刷：

 键：允许，价值：邀请，ACK，CANCEL，OPTIONS，BYE，REFER，SUBSCRIBE，NOTIFY
键：这个Cseq'，值：'102 INVITE
键：呼叫ID，值：'326371826c80e17e6cf6c29861eb2933@10.10.1.99
键：联系方式，值：'＆LT; SIP：15@10.10.1.99>'
键：内容长度，值：'258'
键：内容类型，价值：应用/ SDP
关键：'日期'，值：星期三，2009年14点12分45秒格林尼治标准时间12月06日
键：从'，值：'测试15＆LT; SIP：15@10.10.1.99>
;标签= fromtag
键：马克斯 - 前锋'，值：'70'
重点：'支持'，值：'取代'
键：要'，值：'＆LT; SIP：13@10.10.1.13> ;;标签= totag
键：用户代理，值：'的Asterisk PBX
键：通过，价值：SIP / 2.0 / UDP 10.10.1.99:5060;branch=z9hG4bK343bf628;rport

请注意， \\ r \\ n 组合保持原样中的的值从头。如果要规范化，要等一些LWS字符，例如一个简单的，例如使用

 值= *（省略[CRLF＆GT;＆GT;（〜空白| EOI）！＆GT;＆GT; attr指示（''）|（char_  -  CRLF））;

I'm trying to extract tag from the From: header of a SIP messages.

My regex: ^(From:|f:)((?!\\n\\w).)*;[ ]*tag[ ]*=[ ]*([[:alnum:]]*)

RFC 3261 allows multi-line headers. This new line should start with a whitespace. But i have a problem with multi-line headers. If the tag is on a new line, my regex is not working.

Example correct SIP Message:

INVITE sip:13@10.10.1.13 SIP/2.0
Via: SIP/2.0/UDP 10.10.1.99:5060;branch=z9hG4bK343bf628;rport
Contact: <sip:15@10.10.1.99>
Call-ID: 326371826c80e17e6cf6c29861eb2933@10.10.1.99
CSeq: 102 INVITE
User-Agent: Asterisk PBX
Max-Forwards: 70
Date: Wed, 06 Dec 2009 14:12:45 GMT
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY
Supported: replaces
Content-Type: application/sdp
Content-Length: 258
From: "Test 15" <sip:15@10.10.1.99>
 ; tag   =    fromtag
To: <sip:13@10.10.1.13>;tag=totag

v=0
o=root 1821 1821 IN IP4 10.10.1.99
s=session
c=IN IP4 10.10.1.99
t=0 0
m=audio 11424 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=silenceSupp:off - - - -
a=ptime:20
a=sendrecv

How can I properly parse the multi-line headers? Thanks in advance.

解决方案

I'd second the motion to use/generate a proper parser.

There's nothing stopping you from parsing the headers in a separate step, but you can still specify the grammar declaratively, which is the main point.

The best part here is indeed

the declarative style making it easier to extend with more grammar (the surrounding bits or more details like disallowing CTL characters)
the "free" debugging tools (#define BOOST_SPIRIT_DEBUG, done)

Here's a simple take on the multiline header syntax :

rfc 2616

Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT

rfc 822

 field       =  field-name ":" [ field-body ] CRLF

 field-name  =  1*<any CHAR, excluding CTLs, SPACE, and ":">

 field-body  =  field-body-contents
                [CRLF LWSP-char field-body]

 field-body-contents =
               <the ASCII characters making up the field-body, as
                defined in the following sections, and consisting
                of combinations of atom, quoted-string, and
                specials tokens, or else consisting of texts>

So without further ado, here's a simple grammar for roughly that, parsing from any range of input iterators into a std::map:

using Headers = std::map<std::string, std::string>;

Here's the core of the parser:

    auto& crlf       = "\r\n";
    auto& tspecials = " \t><@,;:\\\"/][?=}{:";

    rule<It, std::string()> token, value;

    token = +~char_(tspecials); // FIXME? should filter CTLs
    value = *(char_ - (crlf >> &(~blank | eoi)));

    Headers headers;
    bool ok = phrase_parse(first, last, (token >> ':' >> value) % crlf >> omit[*lit(crlf)], blank, headers);

#ifdef DEBUG
    if (ok)          std::cerr << "DEBUG: Parse success\n";
    else             std::cerr << "DEBUG: Parse failed\n";
    if (first!=last) std::cerr << "DEBUG: Remaining unparsed input: '" << std::string(first,last) << "'\n";
#endif

You can see a live demo parsing the sample headers from your question:

Live On Coliru

Printing:

Key: 'Allow', Value: 'INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY'
Key: 'CSeq', Value: '102 INVITE'
Key: 'Call-ID', Value: '326371826c80e17e6cf6c29861eb2933@10.10.1.99'
Key: 'Contact', Value: '<sip:15@10.10.1.99>'
Key: 'Content-Length', Value: '258'
Key: 'Content-Type', Value: 'application/sdp'
Key: 'Date', Value: 'Wed, 06 Dec 2009 14:12:45 GMT'
Key: 'From', Value: '"Test 15" <sip:15@10.10.1.99>
; tag   =    fromtag'
Key: 'Max-Forwards', Value: '70'
Key: 'Supported', Value: 'replaces'
Key: 'To', Value: '<sip:13@10.10.1.13>;tag=totag'
Key: 'User-Agent', Value: 'Asterisk PBX'
Key: 'Via', Value: 'SIP/2.0/UDP 10.10.1.99:5060;branch=z9hG4bK343bf628;rport'

Note that the \r\n combo is kept as-is in the value for the From header. If you want to normalize that to some other LWS character, such as a simple ' ', use e.g.

value = *(omit[ crlf >> !(~blank | eoi) ] >> attr(' ') | (char_ - crlf));

这篇关于如何使用正则表达式解析SIP消息的多行头？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用正则表达式解析SIP消息的多行头？ [英] How to parse multi-line headers of SIP message using regex?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何使用正则表达式解析SIP消息的多行头？ [英] How to parse multi-line headers of SIP message using regex?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭