如何使用perl的正则表达式匹配汉字 [英] How to match Chinese character using perl's regex

查看:120
本文介绍了如何使用perl的正则表达式匹配汉字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 utf8 编码的 html 中匹配一些汉字,我写了一些测试代码如下:

I need to match some chinese character in a utf8 encoded html , and I wrote some test code as below :

#! /usr/bin/perl

use strict;
use LWP::UserAgent;
use Encode;

my $ua = new LWP::UserAgent;

my $request = HTTP::Request->new('GET');
my $url = 'http://www.boc.cn/sourcedb/whpj/';
$request->url($url);

my $res = $ua->request($request) ;

my $str_chinese =   encode("utf8" ,"英磅" ) ;  
# my $str_chinese = "英磅" ;


my $str_english = "English" ;
#my $html = decode("utf8" , $res->content) ;
my $html = $res->content ; 

if ( $html =~ /$str_chinese/ ) {
     print "chinese word matched" ;
}else {
     print "chinese word unmatched\n" ;
}

if ( $html =~ /$str_english/i ) {
    print "english word matched\n" ;
}else {
    print "english word unmatched\n" ;
}

输出显示脚本无法匹配嵌入在 html 中的现有中文字符.你能给我一些关于如何解决我的问题的提示吗?

The output shows that the the script fail to match the existing chinese characters embeded in the html. could you give me some hint on how to solve my problem ?

推荐答案

您应该使用 HTTP::Message 代替.无需手动解码.

You should use the method decoded_content from the class HTTP::Message instead. Manual decoding is not necessary.

#!/usr/bin/env perl
use utf8;
use strict;
use LWP::UserAgent;

my $html = LWP::UserAgent->new
    ->get('http://www.boc.cn/sourcedb/whpj/')
    ->decoded_content;

my $str_chinese = '首页';
my $str_english = 'English';

if ($html =~ /$str_chinese/) {
    print "chinese word matched\n";
} else {
    print "chinese word unmatched\n";
}

if ($html =~ /$str_english/i) {
    print "english word matched\n";
} else {
    print "english word unmatched\n";
}

输出:

chinese word matched
english word matched

这篇关于如何使用perl的正则表达式匹配汉字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆