如何检查字符串是否仅包含UTF-8字符 [英] How to check if a string contain only UTF-8 characters
本文介绍了如何检查字符串是否仅包含UTF-8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
到目前为止,我正在做这样的事情:
So far I am doing something like this:
def is_utf8(s):
try:
x=bytes(s,'utf-8').decode('utf-8', 'strict')
print(x)
return 1
except:
return 0
唯一的问题是我不希望它打印任何内容,我想删除 print(x)
,当我这样做时,该功能将停止正常运行.例如,如果我这样做: print(is_utf8("Hstst"))
当打印在函数中时,它将返回0,否则将打印1.>
the only problem is that I don't want it to print anything, I want to delete the print(x)
and when I do that, the function stops functioning correctly.
For example if I do : print(is_utf8("H�tst"))
while the print is in the function it returns 0 otherwise it prints 1. Am i approaching the problem in a wrong way
推荐答案
您可以使用 chardet 一个用于检测未知编码的模块.例如,如果 a
是字节数组,则可以这样确定编码:
You could use the chardet module to detect an unknown encoding. For example if a
is a byte array then you could determine the encoding like this:
import chardet
b = chardet.detect(a)
print(b["encoding"])
这篇关于如何检查字符串是否仅包含UTF-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文