正则表达式来解析多行HTML [英] Regex to parse a multiline HTML
问题描述
HTML代码:
<$ p $ 我试图用正则表达式解析多行html文件。 p>
< TD>详情< / TD>< / TR>
< tr class = d1>
< td> uss_vod_translator< / td>
正则表达式:
< code> if *< td>(\ w *)< \ / td> /)
{
print$ 1;
}
我正在使用 / s *
(空格)为多行,但不起作用。我对它进行了搜索,甚至使用 / \?
作为多行,但这也不起作用。
任何人都可以建议我如何解析多行HTML吗?
我知道正则表达式解析HTML是一个糟糕的解决方案。但是我有一个遗留的HTML代码,我需要解析并且没有其他选择。
任何人都可以建议我如何解析多行HTML?
停止尝试使用正则表达式并使用将解析它的模块为您。
HTML :: TreeBuilder a>是一个很好的解决方案。
HTML :: TreeBuilder :: LibXML 为您提供相同的API,但由快速解析器支持。
HTML :: TreeBuilder :: XPath 增加了XPath支持以及一个快速解析器。
am trying to parse a multi-line html file using regex.
HTML code:
<td>Details</td></tr>
<tr class=d1>
<td>uss_vod_translator</td>
Regex Expression:
if ($line =~ m/Details<\/td>\s*<\/tr>\s*<tr\s*class=d1>\s*<td>(\w*)<\/td>/)
{
print "$1";
}
I am using /s*
(space) for multi-line, but it is not working. I searched about it, even used /\?
for multi-line but that too did not work.
Can any one please suggest me how to parse a multiline HTML?
I know regex is a poor solution to parse HTML. But i have a legacy HTML code which i need to parse and have no other choice.
Can any one please suggest me how to parse a multiline HTML?
Stop trying to use regular expressions and use a module that will parse it for you.
HTML::TreeBuilder is a good solution.
HTML::TreeBuilder::LibXML gives you the same API but backed by a fast parser.
HTML::TreeBuilder::XPath adds XPath support as well as a fast parser.
这篇关于正则表达式来解析多行HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!