在Python中检测具有非英文字符的字符串 [英] Detect strings with non English characters in Python

查看:992
本文介绍了在Python中检测具有非英文字符的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些混合使用英语和没有英语字母的字符串.例如:

I have some strings that have a mix of English and none English letters. For example:

w='_1991_اف_جي2'

如何使用Regex或Python中的其他任何快速方法来识别这些类型的字符串?

How can I recognize these types of string using Regex or any other fast method in Python?

我不希望将字符串中的字母与字母列表一一比较,而要一枪又快地完成.

I prefer not to compare letters of the string one by one with a list of letters, but to do this in one shot and quickly.

推荐答案

您只需检查字符串是否只能使用ASCII字符(拉丁字母+其他字符)进行编码.如果无法编码,则说明它具有其他字母的字符.

You can just check whether the string can be encoded only with ASCII characters (which are Latin alphabet + some other characters). If it can not be encoded, then it has the characters from some other alphabet.

请注意注释# -*- coding: .....它应该在python文件的顶部(否则您会收到一些关于编码的错误)

Note the comment # -*- coding: ..... It should be there at the top of the python file (otherwise you would receive some error about encoding)

# -*- coding: utf-8 -*-
def isEnglish(s):
    try:
        s.encode(encoding='utf-8').decode('ascii')
    except UnicodeDecodeError:
        return False
    else:
        return True

assert not isEnglish('slabiky, ale liší se podle významu')
assert isEnglish('English')
assert not isEnglish('ގެ ފުރަތަމަ ދެ އަކުރު ކަ')
assert not isEnglish('how about this one : 通 asfަ')
assert isEnglish('?fd4))45s&')

这篇关于在Python中检测具有非英文字符的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆