转换HTML中的字符实体编码 [英] convert HTML Character Entity Encoding in R
本文介绍了转换HTML中的字符实体编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想转换HTML字符实体,如
& amp;
至&
或
& gt;
>
I would like to convert HTML character entities like
&
to &
or
>
to >
对于Perl,存在HTML :: Entities包,但我找不到类似于R。
For Perl exists the package HTML::Entities which could do that, but I couldn't find something similar in R.
我也尝试过 iconv()
,但无法得到满意的结果。也许还有一种使用 XML
包的方法,但我还没有想到它。
I also tried iconv()
but couldn't get satisfying results. Maybe there is also a way using the XML
package but I haven't figured it out yet.
推荐答案
尝试以下行:
# load XML package
library(XML)
# Convenience function to convert html codes
html2txt <- function(str) {
xpathApply(htmlParse(str, asText=TRUE),
"//body//text()",
xmlValue)[[1]]
}
# html encoded string
( x <- paste("i", "s", "n", "&", "a", "p", "o", "s", ";", "t", sep = "") )
[1] "isn't"
# converted string
html2txt(x)
[1] "isn't"
UPDATE:编辑html2txt()函数,适用于更多情况
UPDATE: Edited the html2txt() function so it applies to more situations
这篇关于转换HTML中的字符实体编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文