R 不接受 xpath 查询 [英] R not accepting xpath query
问题描述
我正在 R 中使用 XML 包来抓取 html 页面.感兴趣的页面是 http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta 并且在该页面上有一个序列,当检查 chrome 中的元素时,xpath 是
Hi I am using the XML package in R to scrape html pages. The page of interest is http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta and on that page there is a sequence of which when inspecting the element in chrome, the xpath is
//*[@id="gi_225903367_141"]
但是当我尝试使用:
xpathSApply(htmlParse(fasta.url.content),"//*[@id="viewercontent1"]/pre")
Error: unexpected symbol in "xpathSApply(htmlParse(fasta.url.content),"//*[@id="viewercontent1"
我收到上述错误.
XML 包是否对 xpath 很挑剔?
Is the XML package being fussy with the xpath?
这是使用 Mathius 提供的 xpath 的查询
here is the query using the xpath Mathius has provided
xpathSApply(htmlParse(fasta.url.content),"//span[contains(@id,'gi_225903367_1')]")
list()
attr(,"class")
[1] "XMLNodeSet"
我得到一个空列表.我不怀疑 xpath 是不正确的,但我想知道这是否与 R 相关.
where I get an empty list. I don't doubt that that xpath is incorrect, but I wonder if this is R related.
推荐答案
问题是页面是使用javascript动态创建的,在返回给R的渲染中看不到序列.
The problem is that the page is created dynamically using javascript, and the sequence is not visible in the rendering returned to R.
CRAN 包rentrez"提供了一个到 eutils 的接口,它是查询Entrez的编程方式
The CRAN package "rentrez" provides an interface to eutils, which is the programmatic way to query Entrez
library(rentrez)
entrez_fetch(db="protein", id="225903367", rettype="fasta")
这篇关于R 不接受 xpath 查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!