当在xpath中什么都没找到时,如何返回NA? [英] How to return NA when nothing is found in an xpath?
问题描述
很难提出问题,但是通过示例,它很容易理解.
It is difficult to formulate the question, but with an example, it is simple to understand.
我使用R解析html代码.
I use R to parse html code.
在下面,我有一个名为html
的html代码,然后尝试提取//span[@class="number"]
中的所有值和//span[@class="surface"]
中的所有值:
In the following, I have a html code called html
, then I try to extract all values in //span[@class="number"]
and all values in //span[@class="surface"]
:
html <- '<div class="line">
<span class="number">Number 1</span>
<span class="surface">Surface 1</span>
</div>
<div class="line">
<span class="surface">Surface 2</span>
</div>'
page = htmlTreeParse(html,useInternal = TRUE,encoding="UTF-8")
number = unlist(xpathApply(page,'//span[@class="number"]',xmlValue))
surface = unlist(xpathApply(page,'//span[@class="surface"]',xmlValue))
number
的输出是:
[1] "Number 1"
surface
的输出是:
[1] "Surface 1" "Surface 2"
然后,当我尝试cbind
这两个元素时,我不能,因为它们的长度不一样.
Then, when I try to cbind
the two elements, I can't, because they don't have the same length.
所以我的问题是:我该怎么做才能为number
提供一个输出:
So my question is: what can I do to have an output for number
that is:
[1] "Number 1" NA
然后我可以将number
和surface
组合在一起.
Then I can combine number
and surface
.
推荐答案
为每个标签选择封闭标签(此处为div
),然后在其中查找每个标签会更容易.使用rvest和purrr,我发现它更简单
It's easier to select the enclosing tag (the div
here) for each, and look for each tag inside. With rvest and purrr, which I find simpler,
library(rvest)
library(purrr)
html %>% read_html() %>%
html_nodes('.line') %>%
map_df(~list(number = .x %>% html_node('.number') %>% html_text(),
surface = .x %>% html_node('.surface') %>% html_text()))
#> # A tibble: 2 × 2
#> number surface
#> <chr> <chr>
#> 1 Number 1 Surface 1
#> 2 <NA> Surface 2
这篇关于当在xpath中什么都没找到时,如何返回NA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!