Perl的正则表达式 [英] Perl regular expression for html

查看:99
本文介绍了Perl的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为变量网址指定的影片提取IMDB ID(例如:对于电影300,它是tt0416449)。我已经查看了此页面的页面源代码,并提出了以下正则表达式

 使用LWP :: Simple; 
$ url =http://www.imdb.com/search/title?title=$FORM{'title'};

if(is_success($ content = LWP :: Simple :: get($ url))){
print$ url is alive!\\\
;
} else {
print找不到电影;
}

$ code =;

if($ content =〜/< td class =number> 1 \。< / td>< td class =image>< a href = \ / title \ / tt [\d] {1,7}/ s){
$ code = $ 1;
}

我在这一行收到内部服务器错误

  $ content =〜/< td class =number> 1 \。< / td>< td class =image >< a href =\ / title \ / tt [\d] {1,7}/ s 

我对perl非常陌生,如果有人能指出我的错误,我会很感激。

解决方案 div>

使用 HTML解析器正则表达式无法解析HTML。



无论如何,原因该错误可能是你忘了在你的正则表达式中跳出一个正斜杠。它应该看起来像这样:

  /< td class =number> 1 \。< \ / td>< td class =image>< a href =\ / title \ / tt [\d] {1,7}/ s 


I need to extract the IMDB id(example:for the movie 300 it is tt0416449) for a movie specified by the variable URL. I have looked at the page source for this page and come up with the following regex

use LWP::Simple;
$url = "http://www.imdb.com/search/title?title=$FORM{'title'}";

if (is_success( $content = LWP::Simple::get($url) ) ) {
    print "$url is alive!\n";
} else {
    print "No movies found";
}

$code = "";

if ($content=~/<td class="number">1\.</td><td class="image"><a href="\/title\/tt[\d]{1,7}"/s) {
    $code = $1;
}

I am getting an internal server error at this line

$content=~/<td class="number">1\.</td><td class="image"><a href="\/title\/tt[\d]{1,7}"/s

I am very new to perl, and would be grateful if anyone could point out my mistake(s).

解决方案

Use an HTML parser. Regular expressions cannot parse HTML.

Anyway, the reason for the error is probably that you forgot to escape a forward slash in your regex. It should look like this:

/<td class="number">1\.<\/td><td class="image"><a href="\/title\/tt[\d]{1,7}"/s

这篇关于Perl的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆