如何检查字符串是否仅包含UTF-8字符 [英] How to check if a string contain only UTF-8 characters

查看:61
本文介绍了如何检查字符串是否仅包含UTF-8字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

到目前为止,我正在做这样的事情:

So far I am doing something like this:

def is_utf8(s):
    try:
        x=bytes(s,'utf-8').decode('utf-8', 'strict')
        print(x)
        return 1
    except:
        return 0

唯一的问题是我不希望它打印任何内容,我想删除 print(x),当我这样做时,该功能将停止正常运行.例如,如果我这样做: print(is_utf8("Hstst"))当打印在函数中时,它将返回0,否则将打印1.>

the only problem is that I don't want it to print anything, I want to delete the print(x) and when I do that, the function stops functioning correctly. For example if I do : print(is_utf8("H�tst")) while the print is in the function it returns 0 otherwise it prints 1. Am i approaching the problem in a wrong way

推荐答案

您可以使用 chardet a 是字节数组,则可以这样确定编码:

You could use the chardet module to detect an unknown encoding. For example if a is a byte array then you could determine the encoding like this:

import chardet

b = chardet.detect(a)
print(b["encoding"])

这篇关于如何检查字符串是否仅包含UTF-8字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆