python nltk.sent_tokenize错误ascii编解码器无法解码 [英] python nltk.sent_tokenize error ascii codec can't decode

查看：108 发布时间：2020/5/18 1:17:36 python nltk

本文介绍了python nltk.sent_tokenize错误ascii编解码器无法解码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我可以成功地将文本读取到变量中，但是在尝试对文本进行标记化时会遇到这个奇怪的错误:

I could successfully read text into a variable but while trying to tokenize the texts im getting this strange error :

sentences=nltk.sent_tokenize(sample)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11: ordinal not in range(128)

我确实知道错误的原因是令牌生成器无法读取/解码的一些特殊字符串/字符，但是如何绕过它呢? 谢谢

I do know the cause of error is some special string/char which the tokenizer isnt able to read/decode but then how to bypass this? Thanks

python nltk.sent_tokenize错误ascii编解码器无法解码 [英] python nltk.sent_tokenize error ascii codec can&#39;t decode