在 R 中转换 HTML 字符实体编码 [英] convert HTML Character Entity Encoding in R

查看:36
本文介绍了在 R 中转换 HTML 字符实体编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想转换 HTML 字符实体,例如&&>>

I would like to convert HTML character entities like & to & or > to >

对于 Perl 存在可以做到这一点的包 HTML::Entities,但我在 R 中找不到类似的东西.

For Perl exists the package HTML::Entities which could do that, but I couldn't find something similar in R.

我也试过 iconv() 但没有得到满意的结果.也许还有一种使用 XML 包的方法,但我还没有想出来.

I also tried iconv() but couldn't get satisfying results. Maybe there is also a way using the XML package but I haven't figured it out yet.

推荐答案

更新:此答案已过时.请根据新的 xml2 pkg 检查下面的答案.

Update: this answer is outdated. Please check the answer below based on the new xml2 pkg.

尝试以下方法:

# load XML package
library(XML)

# Convenience function to convert html codes
html2txt <- function(str) {
      xpathApply(htmlParse(str, asText=TRUE),
                 "//body//text()", 
                 xmlValue)[[1]] 
}

# html encoded string
( x <- paste("i", "s", "n", "&", "a", "p", "o", "s", ";", "t", sep = "") )
[1] "isn&apos;t"

# converted string
html2txt(x)
[1] "isn't"

更新:编辑了 html2txt() 函数,使其适用于更多情况

UPDATE: Edited the html2txt() function so it applies to more situations

这篇关于在 R 中转换 HTML 字符实体编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆