如果我知道里面的标签，我怎么才能使用Perl的HTML模块找到div的内容？ [英] How can I find the contents of a div using Perl's HTML modules, if I know a tag inside of it?

查看：80 发布时间：2018/6/20 15:29:45 html perl html-parsing

本文介绍了如果我知道里面的标签，我怎么才能使用Perl的HTML模块找到div的内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

自从我问了如何用正则表达式解析html，并且遭到了一些攻击（理所当然），我一直在研究 HTML :: TreeBuilder ， HTML :: Parser ， HTML :: TokeParser 和 HTML :: Elements Perl模块。

我有这样的HTML：

 < div id =listSubtitlesFilm> 
< dt id =a1> 
< a href =/ 45 / subtitles-67624.aspx> 
 .45（2006）
< / a> 
< / dt> 
< / div>

我想解析 /45/subtitles-67624.asp ，但更重要的是我想知道如何解析div的内容

。

我被给了这个上一个问题的例子：

while（my $ anchor = $ parser-> get_tag（'a'））{ if（my $ href = $ anchor-> get_attr（'href'））{ #http：//subscene.com/english/Sit-Down-Shut-Up-First-Season/subtitles -272112.aspx push @dnldLinks，$ 1 if $ href =〜m！/ subtitle-（\d {2,8}）\.aspx !; }
这个工作非常完美，但是当我尝试编辑它时，在``div`上使用它不起作用。这是我试过的代码：

我试过使用这段代码：
<$ c $ （$ my $ anchor = $ p-> get_tag（dt））{
if（$ stuff = $ anchor-> get_attr（'a1'））{
print $ stuff \\\
;
}
}

解决方案
地址，您的具体问题，给定的HTML：
< div id =listSubtitlesFilm> < dt id =a1> < a href =/ 45 / subtitles-67624.aspx> .45（2006） < / a> < / dt> < / div>
我假设您对锚文本感兴趣，即。（2006），在这种情况下，但只有当锚点出现在 div 时，id listSubtitlesFilm 。
＃！/ usr / bin / perl use strict; 使用警告; 使用HTML :: TokeParser :: Simple; my $ parser = HTML :: TokeParser :: Simple-> new（handle => \ * DATA）; my @dnldLinks; while（my $ div = $ parser-> get_tag（'div'））{ my $ id = $ div-> get_attr（'id'）; next除非定义（$ id）和$ id eq'listSubtitlesFilm'; my $ anchor = $ parser-> get_tag（'a'）; my $ href = $ anchor-> get_attr（'href'）; next除非定义（$ href）和$ href =〜m！/ subtitles-（\d {2,8}）\.aspx\z !; push @dnldLinks，[$ parser-> get_trimmed_text（'/ a'），$ 1]; } 使用Data :: Dumper; 打印Dumper \ @ dnldLinks; __DATA__ < div id =listSubtitlesFilm> < dt id =a1> < a href =/ 45 / subtitles-67624.aspx> .45（2006） < / a> < / dt> < / div>
输出：

$ VAR1 = [
[
'.45（2006）'，
'67624'
]
];

Ever since I asked how to parse html with regex and got bashed a bit (rightfully so), I've been studying HTML::TreeBuilder, HTML::Parser, HTML::TokeParser, and HTML::Elements Perl modules.

I have HTML like this:
<div id="listSubtitlesFilm"> <dt id="a1"> <a href="/45/subtitles-67624.aspx"> .45 (2006) </a> </dt> </div>
I want to parse out the /45/subtitles-67624.asp, but more importantly I want to know how to parse out the contents of the div.

I was given this example on a previous question:
while ( my $anchor = $parser->get_tag('a') ) { if ( my $href = $anchor->get_attr('href') ) { #http://subscene.com/english/Sit-Down-Shut-Up-First-Season/subtitles-272112.aspx push @dnldLinks, $1 if $href =~ m!/subtitle-(\d{2,8})\.aspx!; }
This worked perfectly for that, but when I tried to edit it a bit and use it on a ``div` it didn't work. Here is the code I tried:

I tried using this code:
while (my $anchor = $p->get_tag("dt")) { if($stuff = $anchor->get_attr('a1')) { print $stuff."\n"; } }

解决方案
To address, your specific question, given the HTML:
<div id="listSubtitlesFilm"> <dt id="a1"> <a href="/45/subtitles-67624.aspx"> .45 (2006) </a> </dt> </div>
I am assuming you are interested in the anchor text, i.e. ".45 (2006)", in this case, but only if the anchor occurs in a div with id listSubtitlesFilm.
#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $parser = HTML::TokeParser::Simple->new(handle => \*DATA); my @dnldLinks; while ( my $div = $parser->get_tag('div') ) { my $id = $div->get_attr('id'); next unless defined($id) and $id eq 'listSubtitlesFilm'; my $anchor = $parser->get_tag('a'); my $href = $anchor->get_attr('href'); next unless defined($href) and $href =~ m!/subtitles-(\d{2,8})\.aspx\z!; push @dnldLinks, [$parser->get_trimmed_text('/a'), $1]; } use Data::Dumper; print Dumper \@dnldLinks; __DATA__ <div id="listSubtitlesFilm"> <dt id="a1"> <a href="/45/subtitles-67624.aspx"> .45 (2006) </a> </dt> </div>
Output:
$VAR1 = [ [ '.45 (2006)', '67624' ] ];

这篇关于如果我知道里面的标签，我怎么才能使用Perl的HTML模块找到div的内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如果我知道里面的标签，我怎么才能使用Perl的HTML模块找到div的内容？ [英] How can I find the contents of a div using Perl's HTML modules, if I know a tag inside of it?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如果我知道里面的标签，我怎么才能使用Perl的HTML模块找到div的内容？ [英] How can I find the contents of a div using Perl&#39;s HTML modules, if I know a tag inside of it?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

如果我知道里面的标签，我怎么才能使用Perl的HTML模块找到div的内容？ [英] How can I find the contents of a div using Perl's HTML modules, if I know a tag inside of it?

登录关闭