Python和字符串重音 [英] Python and string accents
问题描述
我正在制作卷筒纸刮板.
我访问Google搜索,得到网页的链接,然后得到<title>
标记的内容.
问题是,例如,字符串"P\xe1gina N\xe3o Encontrada!"
应该为"Página Não Encontrada!"
.
我尝试将其解码为latin-1,然后编码为utf-8,但没有用.
I am making a web scraper.
I access google search, I get the link of the web page and then I get the contents of the <title>
tag.
The problem is that, for example, the string "P\xe1gina N\xe3o Encontrada!"
should be "Página Não Encontrada!"
.
I tried do decode to latin-1 and then encode to utf-8 and it did not work.
r2 = requests.get(item_str)
texto_pagina = r2.text
soup_item = BeautifulSoup(texto_pagina,"html.parser")
empresa = soup_item.find_all("title")
print(empresa_str.decode('latin1').encode('utf8'))
可以帮我吗? 谢谢!
推荐答案
您可以将检索到的文本变量更改为以下内容:
You can change the retrieved text variable to something like:
string = u'P\xe1gina N\xe3o Encontrada!'.encode('utf-8')
打印string
后,它似乎对我来说很好.
After printing string
it seemed to work just fine for me.
修改
您是否仅尝试使用empresa_str.decode('latin1')
而不是添加.encode('utf8')
?
Instead of adding .encode('utf8')
, have you tried just using empresa_str.decode('latin1')
?
如:
string = empresa_str.decode('latin_1')
这篇关于Python和字符串重音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!