Perl的正则表达式 [英] Perl regular expression for html
问题描述
我需要为变量网址指定的影片提取IMDB ID(例如:对于电影300,它是tt0416449)。我已经查看了此页面的页面源代码,并提出了以下正则表达式
使用LWP :: Simple;
$ url =http://www.imdb.com/search/title?title=$FORM{'title'};
if(is_success($ content = LWP :: Simple :: get($ url))){
print$ url is alive!\\\
;
} else {
print找不到电影;
}
$ code =;
if($ content =〜/< td class =number> 1 \。< / td>< td class =image>< a href = \ / title \ / tt [\d] {1,7}/ s){
$ code = $ 1;
}
我在这一行收到内部服务器错误
$ content =〜/< td class =number> 1 \。< / td>< td class =image >< a href =\ / title \ / tt [\d] {1,7}/ s
我对perl非常陌生,如果有人能指出我的错误,我会很感激。
无论如何,原因该错误可能是你忘了在你的正则表达式中跳出一个正斜杠。它应该看起来像这样:
/< td class =number> 1 \。< \ / td>< td class =image>< a href =\ / title \ / tt [\d] {1,7}/ s
I need to extract the IMDB id(example:for the movie 300 it is tt0416449) for a movie specified by the variable URL. I have looked at the page source for this page and come up with the following regex
use LWP::Simple;
$url = "http://www.imdb.com/search/title?title=$FORM{'title'}";
if (is_success( $content = LWP::Simple::get($url) ) ) {
print "$url is alive!\n";
} else {
print "No movies found";
}
$code = "";
if ($content=~/<td class="number">1\.</td><td class="image"><a href="\/title\/tt[\d]{1,7}"/s) {
$code = $1;
}
I am getting an internal server error at this line
$content=~/<td class="number">1\.</td><td class="image"><a href="\/title\/tt[\d]{1,7}"/s
I am very new to perl, and would be grateful if anyone could point out my mistake(s).
Use an HTML parser. Regular expressions cannot parse HTML.
Anyway, the reason for the error is probably that you forgot to escape a forward slash in your regex. It should look like this:
/<td class="number">1\.<\/td><td class="image"><a href="\/title\/tt[\d]{1,7}"/s
这篇关于Perl的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!