从python中的句子中删除非英语单词 [英] Removing non-english words from a sentence in python

查看:163
本文介绍了从python中的句子中删除非英语单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个代码,该代码将查询发送给Google并返回结果.我从这些结果中提取摘要(摘要)以进行进一步处理.但是,有时这些片段中包含非英语单词,我不希望它们.例如:

I have written a code which sends queries to Google and returns the results. I extract the snippets(summaries) from these results for further processing. However, sometime non-english words are in these snippets which I don't want them. for example:

/\u02b0w\u025bn w\u025bn unstressed \u02b0w\u0259n w\u0259n/ 

我只想在这句话中加上不加强调"的字眼. 我怎样才能做到这一点? 谢谢

I only want the "unstressed" word in this sentence. How can I do that? thanks

推荐答案

PyEnchant对您来说可能是一个简单的选择.我不知道它的速度,但是您可以执行以下操作:

PyEnchant might be a simple option for you. I do not know about its speed, but you can do things like:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>>

此处找到了一个教程,它也有一些选项返回建议,您可以再次为其他查询或其他内容提供建议.另外,您可以检查结果是否为latin-1(is_utf8()事实,不知道is_latin-1()是否也如此,也许使用类似

A tutorial is found here, it also has options to return suggestions which you can you again for another query or something. In addition you can check if your result is in latin-1 (is_utf8() excists, do not know if is_latin-1() does also, maybe use something like Enca which detects the encoding of text files, on the basis of knowledge of their language.)

这篇关于从python中的句子中删除非英语单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆