如何从Perl正则表达式中提取HTML链接？ [英] How do I extract links from HTML with a Perl regex?

查看：130 发布时间：2018/6/15 13:49:14 html regex perl

本文介绍了如何从Perl正则表达式中提取HTML链接？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个巨大的html，有许多我不需要的东西，但它里面有以下格式提供的URL：

I have a HUGE html which has many things I don't need, but inside it has URLs that are provided in the following format:

<a href="http://www.retailmenot.com/" class=l

我尝试提取网址...我试过了，无济于事：

I'm trying to extract the URLs... I tried, to no avail:

open(FILE,"<","HTML.htm") or die "$!"; my @str = <FILE>; my @matches = grep { m/a href="(.+?") class=l/ } @str

任何关于如何匹配的想法？

Any idea on how to match this?

推荐答案

使用 HTML :: SimpleLinkExtor ， HTML :: LinkExtor ，或者提取Perl模块的其他链接之一。你根本不需要一个正则表达式。

Use HTML::SimpleLinkExtor, HTML::LinkExtor, or one of the other link extracting Perl modules. You don't need a regex at all.

下面是一个简短的例子。你不必子类。您只需告诉％HTML :: Tagset :: linkElements 要收集哪些属性：

Here's a short example. You don't have to subclass. You just have to tell %HTML::Tagset::linkElements which attributes to collect:

#!perl use HTML::LinkExtor; $HTML::Tagset::linkElements{'a'} = [ qw( href class ) ]; $p = HTML::LinkExtor->new; $p->parse( do { local $/; <> } ); my @links = grep { my( $tag, %hash ) = @$_; no warnings 'uninitialized'; $hash{class} eq 'foo'; } $p->links;

如果您需要为任何其他代码收集网址，请进行类似的调整。

If you need to collect URLs for any other tags, you make similar adjustments.

如果你想有一个回调例程，那也不是那么难。您可以在解析器运行时观察链接：

If you'd rather have a callback routine, that's not so hard either. You can watch the links as the parser runs into them:

use HTML::LinkExtor; $HTML::Tagset::linkElements{'a'} = [ qw( href class ) ]; my @links; my $callback = sub { my( $tag, %hash ) = @_; no warnings 'uninitialized'; push @links, $hash{href} if $hash{class} eq 'foo'; }; my $p = HTML::LinkExtor->new( $callback ); $p->parse( do { local $/; <DATA> } );

这篇关于如何从Perl正则表达式中提取HTML链接？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从Perl正则表达式中提取HTML链接？ [英] How do I extract links from HTML with a Perl regex?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何从Perl正则表达式中提取HTML链接？ [英] How do I extract links from HTML with a Perl regex?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭