返回 unicode 字符串的前 N ​​个字符 [英] Returning the first N characters of a unicode string

查看:47
本文介绍了返回 unicode 字符串的前 N ​​个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 unicode 字符串,我需要返回前 N 个字符.我正在这样做:

I have a string in unicode and I need to return the first N characters. I am doing this:

result = unistring[:5]

当然是unicode字符串的长度!=字符的长度.有任何想法吗?唯一的解决方案是使用 re?

but of course the length of unicode strings != length of characters. Any ideas? The only solution is using re?

更多信息

unistring = "Μεταλλικα" #Metallica written in Greek letters
result = unistring[:1]

返回-> ?

我认为unicode字符串是两个字节(char),这就是为什么会发生这种事情.如果我这样做:

I think that unicode strings are two bytes (char), that's why this thing happens. If I do:

result = unistring[:2]

我明白了

M

这是正确的,那么,我应该总是切片 *2 还是应该转换为某些东西?

which is correct, So, should I always slice*2 or should I convert to something?

推荐答案

遗憾的是,在 Python 3.0 之前,由于历史原因,有两种字符串类型.字节字符串 (str) 和 Unicode 字符串 (unicode).

Unfortunately for historical reasons prior to Python 3.0 there are two string types. byte strings (str) and Unicode strings (unicode).

在 Python 3.0 统一之前,有两种方法可以声明字符串文字:unistring = "Μεταλλικα" 这是一个字节字符串和 unistring = u"Μεταλλικα" 这是一个 unicode 字符串.

Prior to the unification in Python 3.0 there are two ways to declare a string literal: unistring = "Μεταλλικα" which is a byte string and unistring = u"Μεταλλικα" which is a unicode string.

您在执行 result = unistring[:1] 时看到 ? 的原因是因为您的 Unicode 文本中的某些字符无法在非Unicode 字符串.如果您曾经使用过非常旧的电子邮件客户端并收到来自希腊等国家/地区的朋友的电子邮件,那么您可能已经遇到过这种问题.

The reason you see ? when you do result = unistring[:1] is because some of the characters in your Unicode text cannot be correctly represented in the non-unicode string. You have probably seen this kind of problem if you ever used a really old email client and received emails from friends in countries like Greece for example.

因此,在 Python 2.x 中,如果您需要处理 Unicode,则必须明确地进行处理.看看这个在 Python 中处理 Unicode 的介绍:Unicode HOWTO

So in Python 2.x if you need to handle Unicode you have to do it explicitly. Take a look at this introduction to dealing with Unicode in Python: Unicode HOWTO

这篇关于返回 unicode 字符串的前 N ​​个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆