如何分割汉字? [英] How do I split Chinese characters one by one?

查看:173
本文介绍了如何分割汉字?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果在firstname和lastname之间没有特殊字符(例如空格等)。



然后如何拆分下面的中文字符。

  use strict ; 
使用警告;
use Data :: Dumper;

my $ fh = \ * DATA;
my $ fname; #小三;
my $ lname; #张;
while(my $ name =< $ fh>)
{

$ name =〜? ;
print $ fname/ n;
print $ lname;

}

__DATA__
张小三

输出

 小三

[Update]



WinXP。你有问题,因为你在输入和编码Perl字符串时忽略了解码二进制数据到Perl字符串。二进制数据。这是因为正则表达式及其朋友 split 在Perl字符串上正常工作。



(?<=。)意味着在第一个字符之后。因此,此程序将无法正确地对复姓/复合系列名称工作;请记住,他们是罕见的,但确实存在。



Linux版本:

$ b。为了始终正确地将名称拆分为姓氏和名称部分,您需要使用带有姓氏的字典。
$ b

 使用strict; 
使用警告;
use编码qw(解码编码);

while(my $ full_name =< DATA>){
$ full_name = decode('UTF-8',$ full_name);
chomp $ full_name;
my($ family_name,$ given_name)= split(/(?<=。)/,$ full_name,2);
print encode('UTF-8',
sprintf('全名是%s,家庭名是%s,给定名称是%s。',$ full_name,$ family_name,$ given_name)
);

}

__DATA__
张小三

输出:

 全名为张小三,姓氏为张, 

Windows版本:

  use strict; 
使用警告;
use编码qw(解码编码);
use Encode :: HanExtra qw();

while(my $ full_name =< DATA>){
$ full_name = decode('GB18030',$ full_name);
chomp $ full_name;
my($ family_name,$ given_name)= split(/(?<=。)/,$ full_name,2);
print encode('GB18030',
sprintf('全名是%s,家庭名是%s,给定名称是%s。',$ full_name,$ family_name,$ given_name)
);

}

__DATA__
张小三

输出:

 全名为张小三,姓氏为张, 


If there is no special character(such as white space, : etc) between firstname and lastname.

Then how to split the Chinese characters below.

use strict; 
use warnings; 
use Data::Dumper;  

my $fh = \*DATA;  
my $fname; # 小三; 
my $lname; # 张 ;
while(my $name = <$fh>)
{

    $name =~ ??? ;
    print $fname"/n";
    print $lname;

}

__DATA__  
张小三

Output

小三
张

[Update]

WinXP. ActivePerl5.10.1 used.

解决方案

You have problems because you neglect to decode binary data to Perl strings during input and encode Perl strings to binary data during output. The reason for this is that regular expressions and its friend split work properly on Perl strings.

(?<=.) means "after the first character". As such, this program will not work correctly on 复姓/compound family names; keep in mind that they are rare, but do exist. In order to always correctly split a name into family name and given name parts, you need to use a dictionary with family names.

Linux version:

use strict;
use warnings;
use Encode qw(decode encode);

while (my $full_name = <DATA>) {
    $full_name = decode('UTF-8', $full_name);
    chomp $full_name;
    my ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);
    print encode('UTF-8',
        sprintf('The full name is %s, the family name is %s, the given name is %s.', $full_name, $family_name, $given_name)
    );

}

__DATA__
张小三

Output:

The full name is 张小三, the family name is 张, the given name is 小三.

Windows version:

use strict;
use warnings;
use Encode qw(decode encode);
use Encode::HanExtra qw();

while (my $full_name = <DATA>) {
    $full_name = decode('GB18030', $full_name);
    chomp $full_name;
    my ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);
    print encode('GB18030',
        sprintf('The full name is %s, the family name is %s, the given name is %s.', $full_name, $family_name, $given_name)
    );

}

__DATA__
张小三

Output:

The full name is 张小三, the family name is 张, the given name is 小三.

这篇关于如何分割汉字?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆