当在xpath中什么都没找到时,如何返回NA? [英] How to return NA when nothing is found in an xpath?

查看:161
本文介绍了当在xpath中什么都没找到时,如何返回NA?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很难提出问题,但是通过示例,它很容易理解.

It is difficult to formulate the question, but with an example, it is simple to understand.

我使用R解析html代码.

I use R to parse html code.

在下面,我有一个名为html的html代码,然后尝试提取//span[@class="number"]中的所有值和//span[@class="surface"]中的所有值:

In the following, I have a html code called html, then I try to extract all values in //span[@class="number"] and all values in //span[@class="surface"]:

html <- '<div class="line">
<span class="number">Number 1</span>
<span class="surface">Surface 1</span>
</div>
<div class="line">
<span class="surface">Surface 2</span>
</div>' 

page = htmlTreeParse(html,useInternal = TRUE,encoding="UTF-8")

number = unlist(xpathApply(page,'//span[@class="number"]',xmlValue))
surface = unlist(xpathApply(page,'//span[@class="surface"]',xmlValue))

number的输出是:

[1] "Number 1"

surface的输出是:

[1] "Surface 1" "Surface 2"

然后,当我尝试cbind这两个元素时,我不能,因为它们的长度不一样.

Then, when I try to cbind the two elements, I can't, because they don't have the same length.

所以我的问题是:我该怎么做才能为number提供一个输出:

So my question is: what can I do to have an output for number that is:

[1] "Number 1" NA

然后我可以将numbersurface组合在一起.

Then I can combine number and surface.

推荐答案

为每个标签选择封闭标签(此处为div),然后在其中查找每个标签会更容易.使用rvest和purrr,我发现它更简单

It's easier to select the enclosing tag (the div here) for each, and look for each tag inside. With rvest and purrr, which I find simpler,

library(rvest)
library(purrr)

html %>% read_html() %>% 
    html_nodes('.line') %>% 
    map_df(~list(number = .x %>% html_node('.number') %>% html_text(), 
                 surface = .x %>% html_node('.surface') %>% html_text()))

#> # A tibble: 2 × 2
#>     number   surface
#>      <chr>     <chr>
#> 1 Number 1 Surface 1
#> 2     <NA> Surface 2

这篇关于当在xpath中什么都没找到时,如何返回NA?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆