perl使用正则表达式在html标签之间提取文本 [英] perl extract text between html tags using regex

查看：166 发布时间：2018/6/21 17:26:04 html regex perl tags

本文介绍了perl使用正则表达式在html标签之间提取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Perl的新手，试图提取所有< li> < / li> 标记在字符串中，并使用正则表达式或拆分/连接将它们分配到数组中。

$ b

  my $ string =< ul> 
< li> hello< / li> 
< li> there< ; / li> 
< li>大家< / li> 
< / ul>;

所以这段代码...

  foreach $ value（@array）{
 print$ value \\\
; 
}

...此输出中的结果：

  hello 
 there 
 everyone

<注意：不要使用正则表达式来解析HTML。

第一个选项是使用 HTML :: TreeBuilder 完成的，HTML是许多可用的HTML解析器之一。您可以访问上面提供的链接并阅读文档，并查看给出的示例。

  use strict; 
使用警告; 
使用HTML :: TreeBuilder; 
 
 my $ str 
 =< ul> 
。 <李>你好< /锂> 中
。 <李>还有< /锂> 中
。 <李>每个人< /锂> 中
。 < / UL> 中
; 
 
＃现在创建一个新的树来解析来自String $ str $ b $ my $ tr = HTML :: TreeBuilder-> new_from_content（$ str）的HTML; 
 
＃现在找到所有< li>标记并使用值创建一个数组。 
 my @lists = 
 map {$ _-> content_list} 
 $ tr-> find_by_tag_name（'li'）; 
 
＃并通过数组循环返回值。 
 foreach我的$ val（@lists）{
 print $ val，\\\
; 
 
 
 
 $ b $ p 
 $ b如果你决定在这里使用正则表达式（我不推荐）。你可以做类似的事情。
  my $ str 
 =< ul> 
。 <李>你好< /锂> 中
。 <李>还有< /锂> 中
。 <李>每个人< /锂> 中
。 < / UL> 中
; 
 
 my @matches; 
 while（$ str =〜/（？<=>）（。*？）（？= ）/ g）{
 push @matches，$ 1 ; 
} 
 
 foreach my $ m（@matches）{
 print $ m，\\\
; 
} 
  
输出： 
 
 
  hello 
 there 
 everyone 
  
 
I'm new to Perl and im trying to extract the text between all <li> </li> tags in a string and assign them into an array using regex or split/join.

e.g.
my $string = "<ul>
                  <li>hello</li>
                  <li>there</li>
                  <li>everyone</li>
              </ul>";
So that this code...
foreach $value(@array){
    print "$value\n";
}
...results in this output:
hello
there
everyone

 解决方案 
Note: Do not use regular expressions to parse HTML.

This first option is done using HTML::TreeBuilder, one of many HTML Parsers that is available to use. You can visit the link provided above and read the documentation and see the example's that are given.
use strict;
use warnings;
use HTML::TreeBuilder;

my $str 
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

# Now create a new tree to parse the HTML from String $str
my $tr = HTML::TreeBuilder->new_from_content($str);

# And now find all <li> tags and create an array with the values.
my @lists = 
      map { $_->content_list } 
      $tr->find_by_tag_name('li');

# And loop through the array returning our values.
foreach my $val (@lists) {
   print $val, "\n";
}
If you decide you want to use a regular expression here (I don't recommend). You could do something like..
my $str
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

my @matches;
while ($str =~/(?<=<li>)(.*?)(?=<\/li>)/g) {
  push @matches, $1;
}

foreach my $m (@matches) {
   print $m, "\n";
}
Output:
hello
there
everyone


                        
这篇关于perl使用正则表达式在html标签之间提取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

perl使用正则表达式在html标签之间提取文本 [英] perl extract text between html tags using regex

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

perl使用正则表达式在html标签之间提取文本 [英] perl extract text between html tags using regex

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭