在Perl中解析HTML [英] HTML parsing in perl

查看:125
本文介绍了在Perl中解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用perl解析下面的HTML结构。我需要选择所有包含类消息和ID的dd元素。
我希望脚本做的是循环所有的dd元素并打印dd元素的id,但它需要忽略第一个dd元素,因为它是静态的,不会改变。



只要可以从cpan安装,就可以使用任何perl模块,以方便我使用。我没有太多的perl经验和解析html,所以任何指针都会很有帮助。



感谢:)

HTML结构:

 < pre>< code> 
< html>
< head>
< / head>
< body>
.....其他元素
< div id =messages>
< div class =header>< / div>
< dl>
< dd class =message unread mc-friend mc-message>这只是一个随机消息,不要解析< / dd>
< dd id =msg2class =message unread mc-message>
Hello
< / div>
< dd id =msg3class =message unread mc-message>
您好
< / dd>
< / dl>
< / div>
< / body>
< / html>
< / pre>< / code>


解决方案

类似这样,快速简单:

 #! / usr / bin / perl 
使用strict;
使用警告;

使用Mojo :: DOM;

my $ html =您的HTML在这里;

my $ dom = Mojo :: DOM-> new;
$ dom-> parse($ html);
my $ skip;
for $ dd($ dom-> find('dd [class * ='message']') - > each){
print $ dd-> attrs-> {id },\\\
if $ skip ++;
}


I'm trying to parse the following HTML structure with in perl. I need to select all of the dd elements that contain the class message and also an id. All I would like the script to do is loop through all of the dd elements and print out the id of the dd element but it needs to ignore the first dd element as that is static and will not change.

It can be with any perl module as long as it can be installed from cpan to make it easy for me. I don't have much experience with perl and parsing html so any pointers would be very helpful.

Thanks :)

HTML Structure:

<pre><code>
<html>
<head>
</head>
<body>
 .....other elements
    <div id="messages">
        <div class="header"></div>
        <dl>
            <dd class="message unread mc-friend mc-message">This is just a random message, do not parse</dd>
            <dd id="msg2" class="message unread mc-message">
                Hello
            </div>
            <dd id="msg3" class="message unread mc-message">
                Hello
            </dd>
        </dl>
    </div>
</body>
</html>
</pre></code>

解决方案

Something like this, quick and easy:

#! /usr/bin/perl
use strict;
use warnings;

use Mojo::DOM;

my $html = "Your HTML goes here";

my $dom = Mojo::DOM->new;
$dom->parse($html);
my $skip;
for my $dd ($dom->find('dd[class*="message"]')->each) {
    print $dd->attrs->{id}, "\n" if $skip++;
}

这篇关于在Perl中解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆