转换HTML中的字符实体编码 [英] convert HTML Character Entity Encoding in R

查看:681
本文介绍了转换HTML中的字符实体编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想转换HTML字符实体,如
& amp; &
& gt; >

I would like to convert HTML character entities like & to & or > to >

对于Perl,存在HTML :: Entities包,但我找不到类似于R。

For Perl exists the package HTML::Entities which could do that, but I couldn't find something similar in R.

我也尝试过 iconv(),但无法得到满意的结果。也许还有一种使用 XML 包的方法,但我还没有想到它。

I also tried iconv() but couldn't get satisfying results. Maybe there is also a way using the XML package but I haven't figured it out yet.

推荐答案

尝试以下行:

# load XML package
library(XML)

# Convenience function to convert html codes
html2txt <- function(str) {
      xpathApply(htmlParse(str, asText=TRUE),
                 "//body//text()", 
                 xmlValue)[[1]] 
}

# html encoded string
( x <- paste("i", "s", "n", "&", "a", "p", "o", "s", ";", "t", sep = "") )
[1] "isn&apos;t"

# converted string
html2txt(x)
[1] "isn't"

UPDATE:编辑html2txt()函数,适用于更多情况

UPDATE: Edited the html2txt() function so it applies to more situations

这篇关于转换HTML中的字符实体编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆