如何将Net :: HTTP响应转换为Ruby 1.9.1中的某个编码? [英] How to convert a Net::HTTP response to a certain encoding in Ruby 1.9.1?
问题描述
我有一个Sinatra应用程序( http://analyzethis.espace-technologies.com )以下
- 检索HTML页面(通过net / http)
- 创建一个Nokogiri文档response.body
- 提取一些信息并将其发回回应。响应应该是UTF-8编码
所以我在尝试阅读使用Windows-1256编码的网站时遇到了问题,如www .filfan.com或www.masrawy.com。
问题是编码转换的结果不正确,但没有发生错误。
net / http response.body.encoding给出ASCII-8BIT,不能转换为UTF-8
如果我做Nokogiri :: HTML(response.body)并使用CSS选择器从页面获取某些内容 - 例如标题标签的内容 - 我得到一个字符串,当我调用string.encoding返回WINDOWS-1256。我使用string.encode(utf-8)并发送响应,但是再次响应不正确。
关于我的错误的任何建议或想法方法?
因为Net :: HTTP不能正确处理编码。请参阅 http://bugs.ruby-lang.org/issues/2567
您可以解析包含字符集的 然后使用 I have a Sinatra application (http://analyzethis.espace-technologies.com) that does the following So I came to the problem while trying to read sites that use windows-1256 encodings like www.filfan.com or www.masrawy.com. The problem is the result of the encoding conversion is not correct though no errors are thrown. The net/http response.body.encoding gives ASCII-8BIT which can not be converted to UTF-8 If I do Nokogiri::HTML(response.body) and use the css selectors to get certain content from the page - say the content of the title tag for example - I get a string which when i call string.encoding returns WINDOWS-1256. I use string.encode("utf-8") and send the response using that but again the response is not correct. Any suggestions or ideas about what's wrong in my approach? Because Net::HTTP does not handle encoding correctly. See http://bugs.ruby-lang.org/issues/2567 You can parse Then use 这篇关于如何将Net :: HTTP响应转换为Ruby 1.9.1中的某个编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! response ['content-type']
,而不是解析整个响应
force_encoding()
设置正确的编码。如果网站以UTF-8提供,则p>
response.body.force_encoding(UTF-8)
response['content-type']
which contains charset instead of parsing whole response.body
.force_encoding()
to set right encoding.response.body.force_encoding("UTF-8")
if site is served in UTF-8.