解码Django和lxml中的问题 [英] Decoding problems in Django and lxml

查看:105
本文介绍了解码Django和lxml中的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我使用部署的Django应用程序版本时,我有一个lxml的奇怪问题。我使用lxml来解析从我的服务器获取的另一个HTML页面。这在我自己的计算机上在我的开发服务器上工作得很好,但是由于某种原因,它在服务器上给出了 UnicodeDecodeError

I have a strange problem with lxml when using the deployed version of my Django application. I use lxml to parse another HTML page which I fetch from my server. This works perfectly well on my development server on my own computer, but for some reason it gives me UnicodeDecodeError on the server.

('utf8', "\x85why hello there!", 0, 1, 'unexpected code byte')

我已经确定Apache(with mod_python)以 LANG ='en_US.UTF-8'运行。

I have made sure that Apache (with mod_python) runs with LANG='en_US.UTF-8'.

我已经尝试谷歌搜索这个问题,并尝试不同的方法来正确解码字符串,但我无法弄清楚。

I've tried googling for this problem and tried different approaches to decoding the string correctly, but I can't figure it out.

在你的答案中,你可以假设我的字符串被称为 hello 或某事。

In your answer, you may assume that my string is called hello or something.

推荐答案

\x85你好!不是utf-8编码的字符串。您应该尝试解码网页,然后将其传递给lxml。通过查看http标头来检查它所使用的编码,可能您发现问题。

"\x85why hello there!" is not a utf-8 encoded string. You should try decoding the webpage before passing it to lxml. Check what encoding it uses by looking at the http headers when you fetch the page maybe you find the problem there.

这篇关于解码Django和lxml中的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆