将转义的XML实体转换回UTF-8 [英] Converting escaped XML entities back into UTF-8

查看:194
本文介绍了将转义的XML实体转换回UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个XML文件中的UTF-8字符串:

So I've got this UTF-8 string in an XML file:

Horrible place. ☠☠☠

当我将它提供给一个外部应用程序时,有趣的角色回来转义为XML实体:

And when I feed it to an external application, the funny characters come back escaped as XML entities:

Horrible place. ☠☠☠

在Ruby中,如何将该字符串转换回UTF-8?这可能是一个非常简单的解决方案,但我无法在标准库中找到任何东西;例如。 CGI.unescapeHTML (它非常适合于& gt; 等)似乎完全忽略它们。 >

In Ruby, how do I convert that string back to UTF-8? There's probably a really easy solution for this, but I'm unable to find anything in the standard libraries; eg. CGI.unescapeHTML (which work nicely for things like >) seem to ignore them completely.

ree-1.8.7-2010.02 > CGI.unescapeHTML('>')
 => ">" 
ree-1.8.7-2010.02 > CGI.unescapeHTML('☠')
 => "☠" 


推荐答案

嗯,由于是XML编码,我会去一个XML解析器:

Well, since it's XML encoded I'd go for an XML parser:

require 'nokogiri'

frag = 'Horrible place. ☠☠☠'
doc = Nokogiri::XML.fragment(frag)
puts doc.text
# >> Horrible place. ☠☠☠

这篇关于将转义的XML实体转换回UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆