如何开始使用 Perl 抓取网页? [英] How can I get started with web page scraping using Perl?
问题描述
我对学习 Perl 感兴趣.我正在使用 Learning Perl 书籍和 cpan 的网站作为参考.
I am interested in learning Perl. I am using Learning Perl books and cpan's web-sites for reference.
我期待着使用 Perl 做一些网络/文本抓取应用程序来应用我所学到的一切.
I am looking forward to do some web/text scraping application using Perl to apply whatever I have learnt.
请给我推荐一些好的选择.
Please suggest me some good options to begin with.
(这不是作业.想在 Perl 中做一些可以帮助我利用基本 Perl 功能的东西)
(this is not a homework. want to do something in Perl that would help me exploit basic Perl features)
推荐答案
如果您要抓取的网页需要 JavaScript 才能正常运行,那么您将需要的不仅仅是 WWW::Mechanize 可以为您提供.您甚至可能不得不通过 Perl 来控制特定的浏览器(例如使用 Win32::IE::Mechanize 或 WWW::Mechanize::Firefox).
If the web pages you want to scrape require JavaScript to function properly, you are going to need more than what WWW::Mechanize can provide you. You might even have to resort to controlling a specific browser via Perl (e.g. using Win32::IE::Mechanize or WWW::Mechanize::Firefox).
我没试过,但也有 WWW::Scripter使用 WWW::Scripter::Plugin::JavaScript插件.
I haven't tried it, but there is also WWW::Scripter with the WWW::Scripter::Plugin::JavaScript plugin.
这篇关于如何开始使用 Perl 抓取网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!