字符串的长度在Perl独立于字符编码 [英] Length of string in Perl independent of character encoding

查看:133
本文介绍了字符串的长度在Perl独立于字符编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

长度函数假定汉字超过一个字符。如何在Perl中确定字符串的长度而不考虑字符编码(将中文字符视为一个字符)?

The length function assumes that Chinese characters are more than one character. How do I determine length of a string in Perl independent of character encoding (treat Chinese characters as one character)?

推荐答案

a href =http://perldoc.perl.org/functions/length.html> length 函数对字符而不是字节(AKA字节) 。字符的定义取决于编码。汉字仍然是单个字符(如果编码设置正确!),但它们占用多个空格的八位字节。因此,Perl中的字符串的长度取决于Perl认为字符串所在的字符编码;

The length function operates on characters, not octets (AKA bytes). The definition of a character depends on the encoding. Chinese characters are still single characters (if the encoding is correctly set!) but they take up more than one octet of space. So, the length of a string in Perl is dependent on the character encoding that Perl thinks the string is in; the only string length that is independent of the character encoding is the simple byte length.

确保所讨论的字符串被标记为UTF-8并以UTF-8格式编码。 8。例如,这产生3:

Make sure that the string in question is flagged as UTF-8 and encoded in UTF-8. For example, this yields 3:

$ perl -e 'print length("长")'

,但会产生1:

$ perl -e 'use utf8; print length("长")'

$ perl -e 'use Encode; print length(Encode::decode("utf-8", "长"))'

你从文件中获取你的汉字,请确保你在读取或写入之前 binmode $ fh,':utf8'文件;如果您从数据库获取数据,请确保数据库以UTF-8格式返回字符串(或使用 Encode 为你做)。

If you're getting your Chinese characters from a file, make sure that you binmode $fh, ':utf8' the file before reading or writing it; if you're getting your data from a database, make sure the database is returning strings in UTF-8 format (or use Encode to do it for you).

我不认为你必须拥有UTF- 8,你真的只需要确保字符串被标记为具有正确的编码。我会使用UTF-8前端到后端(甚至横向),虽然这是Unicode的通用语言,它将使事情更容易如果你使用它无处不在。

I don't think you have to have everything in UTF-8, you really only need to ensure that the string is flagged as having the correct encoding. I'd go with UTF-8 front to back (and even sideways) though as that's the lingua franca for Unicode and it will make things easier if you use it everywhere.

如果您打算处理非ASCII数据,您可能需要花一些时间阅读 perlunicode 手册页。

You might want to spend some time reading the perlunicode man page if you're going to be dealing with non-ASCII data.

这篇关于字符串的长度在Perl独立于字符编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆