我如何使用R从html文件中提取数据 [英] how can I extract data from html file using R

查看:102
本文介绍了我如何使用R从html文件中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从GEO网站提取一些数据,我该怎么做?该网站的URL为 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSM410750 ,我想获取患者的疾病状态",我使用了命令

I want to extract some data from the GEO website, how can I do this? The URL of the site is http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM410750, and I want to get the "disease state" of the patient, I used the command

readLines("http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM410750")

要导入html,我需要的信息在第288行.有人可以帮我吗?非常感谢你.我会很感激的.

to import the html, the information I need is in the 288th line. Could someone help me? Thank you very much. I will appreciate it.

推荐答案

通常,当这样的问题被提出时,需要付出一些努力.因此,请考虑尽最大努力说明您下一次尝试的确切问题.要开始使用,这里是使用 XML 的示例> 包,并应用 XPath strsplit 来抓取理想的结果.

Usually when questions like this are asked some effort needs to be shown. So please take consideration to state the exact problem with at least some effort on what you have attempted next time. To get you started here is an example using the XML package and applying XPath along with strsplit to grab the desired result.

library(XML)
doc <- htmlParse("http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM410750")
x <- xpathSApply(doc, "//td[@style='text-align: justify']/text()[preceding-sibling::br][1]",
    function(X) { strsplit(xmlValue(X), ': ')[[1]][2]
})
# [1] "Uninfected"

这篇关于我如何使用R从html文件中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆