WWW :: Mechanize提取帮助-PERL [英] WWW::Mechanize Extraction Help - PERL

查看：94 发布时间：2020/5/25 1:42:36 perl parsing screen-scraping www-mechanize html-treebuilder

本文介绍了WWW :: Mechanize提取帮助-PERL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试自动提取网站上找到的笔录.由于该站点在描述列表中格式化了采访，因此可以在dl标签之间找到整个成绩单.我下面的脚本允许我搜索站点并以纯文本格式提取文本，但实际上我正在寻找包含dl标签之间的所有内容，这意味着dd，dt等.这将使我们能够开发我们自己的CSS进行采访.

I'm try to automate the extraction of a transcript found on a website. The entire transcript is found between dl tags since the site formatted the interview in a description list. The script I have below allows me to search the site and extract the text in a plain-text format, but I'm actually looking for it to include everything between the dl tags, meaning dd's, dt's, etc. This will allow us to develop our own CSS for the interview.

该页面需要注意的一点是，在面试过程中的不同时间点插入了break语句.我们发现一些使用配对从网页中提取信息的工具发现这是一个问题，因为它只能获取信息，直到break语句为止.如果您将我指向不同的方向，请记住一些注意事项.这就是我到目前为止所拥有的.

Something to note about the page is that there are break statements inserted at various points during the interview. Some tools we've found that extract information from webpages using pairings have found this to be a problem since it only grabs the information up until the break statement. Just something to keep in mind if you point me in a different direction. Here's what I have so far.

#!/usr/bin/perl -w

use strict;
use WWW::Mechanize;
use WWW::Mechanize::TreeBuilder;

my $mech = WWW::Mechanize->new();
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get("http://millercenter.org/president/clinton/oralhistory/madeleine-k-albright");

# find all <dl> tags
my @list = $mech->find('dl');

foreach ( @list ) {
print $_->as_text();
}

如果有一个基本上可以打印我所拥有内容的工具，仅这次将其打印为HTML，请告诉我！

If there is a tool that essentially prints what I have, only this time as HTML, please let me know of it!

WWW :: Mechanize提取帮助-PERL [英] WWW::Mechanize Extraction Help - PERL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

WWW :: Mechanize提取帮助-PERL [英] WWW::Mechanize Extraction Help - PERL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭