R 不接受 xpath 查询 [英] R not accepting xpath query

查看:33
本文介绍了R 不接受 xpath 查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 R 中使用 XML 包来抓取 html 页面.感兴趣的页面是 http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta 并且在该页面上有一个序列,当检查 chrome 中的元素时,xpath 是

Hi I am using the XML package in R to scrape html pages. The page of interest is http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta and on that page there is a sequence of which when inspecting the element in chrome, the xpath is

//*[@id="gi_225903367_141"]

但是当我尝试使用:

xpathSApply(htmlParse(fasta.url.content),"//*[@id="viewercontent1"]/pre")
Error: unexpected symbol in "xpathSApply(htmlParse(fasta.url.content),"//*[@id="viewercontent1"

我收到上述错误.

XML 包是否对 xpath 很挑剔?

Is the XML package being fussy with the xpath?

这是使用 Mathius 提供的 xpath 的查询

here is the query using the xpath Mathius has provided

xpathSApply(htmlParse(fasta.url.content),"//span[contains(@id,'gi_225903367_1')]")
list()
attr(,"class")
[1] "XMLNodeSet"

我得到一个空列表.我不怀疑 xpath 是不正确的,但我想知道这是否与 R 相关.

where I get an empty list. I don't doubt that that xpath is incorrect, but I wonder if this is R related.

推荐答案

问题是页面是使用javascript动态创建的,在返回给R的渲染中看不到序列.

The problem is that the page is created dynamically using javascript, and the sequence is not visible in the rendering returned to R.

CRAN 包rentrez"提供了一个到 eutils 的接口,它是查询Entrez的编程方式

The CRAN package "rentrez" provides an interface to eutils, which is the programmatic way to query Entrez

library(rentrez)
entrez_fetch(db="protein", id="225903367", rettype="fasta")

这篇关于R 不接受 xpath 查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆