如何让python解释器正确处理字符串操作中的非ASCII字符? [英] How to make the python interpreter correctly handle non-ASCII characters in string operations?

查看：39 发布时间：2021/6/25 19:21:56 python unicode

本文介绍了如何让python解释器正确处理字符串操作中的非ASCII字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个看起来像这样的字符串:

I have a string that looks like so:

6Â 918Â 417Â 712

修剪这个字符串的明确方法(按照我对 Python 的理解)只是说字符串在一个名为 s 的变量中，我们得到:

The clear cut way to trim this string (as I understand Python) is simply to say the string is in a variable called s, we get:

s.replace('Â ', '')

这应该可以解决问题.但当然它会抱怨文件 blabla.py 中的非 ASCII 字符 '\xc2' 未编码.

That should do the trick. But of course it complains that the non-ASCII character '\xc2' in file blabla.py is not encoded.

我一直不太明白如何在不同的编码之间切换.

I never quite could understand how to switch between different encodings.

这是代码，它确实和上面的一样，但现在是在上下文中.该文件在记事本中保存为 UTF-8，并具有以下标题:

Here's the code, it really is just the same as above, but now it's in context. The file is saved as UTF-8 in notepad and has the following header:

#!/usr/bin/python2.4
# -*- coding: utf-8 -*-

代码:

f = urllib.urlopen(url)

soup = BeautifulSoup(f)

s = soup.find('div', {'id':'main_count'})

#making a print 's' here goes well. it shows 6Â 918Â 417Â 712

s.replace('Â ','')

save_main_count(s)

它只不过是 s.replace...

推荐答案

Python 2 使用 ascii 作为源文件的默认编码，这意味着您必须在文件顶部指定另一种编码才能在文字中使用非 ascii unicode 字符.Python 3 使用 utf-8 作为源文件的默认编码，所以这不是什么问题.

Python 2 uses ascii as the default encoding for source files, which means you must specify another encoding at the top of the file to use non-ascii unicode characters in literals. Python 3 uses utf-8 as the default encoding for source files, so this is less of an issue.

见:http://docs.python.org/tutorial/interpreter.html#source-代码编码

要启用 utf-8 源编码，这将放在前两行之一中:

To enable utf-8 source encoding, this would go in one of the top two lines:

# -*- coding: utf-8 -*-

以上在文档中，但这也有效:

The above is in the docs, but this also works:

# coding: utf-8

其他注意事项:

源文件也必须在文本编辑器中使用正确的编码进行保存.

The source file must be saved using the correct encoding in your text editor as well.

在 Python 2 中，unicode 文字前必须有一个 u，如 s.replace(u"Â ", u"") 但是在 Python 3 中，只需使用引号.在 Python 2 中，您可以 from __future__ import unicode_literals 来获取 Python 3 的行为，但请注意这会影响整个当前模块.

In Python 2, the unicode literal must have a u before it, as in s.replace(u"Â ", u"") But in Python 3, just use quotes. In Python 2, you can from __future__ import unicode_literals to obtain the Python 3 behavior, but be aware this affects the entire current module.

s.replace(u"Â ", u"") 如果 s 不是 unicode 字符串，也会失败.

s.replace(u"Â ", u"") will also fail if s is not a unicode string.

string.replace 返回一个新字符串并且不会就地编辑，因此请确保您也使用返回值

string.replace returns a new string and does not edit in place, so make sure you're using the return value as well

这篇关于如何让python解释器正确处理字符串操作中的非ASCII字符?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何让python解释器正确处理字符串操作中的非ASCII字符? [英] How to make the python interpreter correctly handle non-ASCII characters in string operations?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何让python解释器正确处理字符串操作中的非ASCII字符? [英] How to make the python interpreter correctly handle non-ASCII characters in string operations?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭