Rvest:获取节点文本而不是其子文本 [英] Rvest: getting node text and not its childen's text

查看：34 发布时间：2021/7/14 18:39:22 r web-scraping rvest

本文介绍了Rvest:获取节点文本而不是其子文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

方法html_text()(来自R Package rvest)连接节点的文本和它的所有子节点.我想只提取父亲的文字.

对于下面的例子，html_text() 给出了 HELLO GOODBYE.

我只想得到再见.我怎样才能得到它?


<div class="div_inside"><div class="title_inside"><a class="link" href="sompage.htm">HELLO</a>

再见

解决方案

尝试使用 class "joke" 抓住主要的 div 标签而不拿起它的孩子，使用路径:

库(rvest)read_html('your_html_script') %>%html_nodes(xpath = '//div[@class="joke"]/node()[not(self::div)]') %>%html_text()

谢谢！

The method html_text() (from R Package rvest) concatenates the text of the node and all its children. I would like to extract only the father's text.

Forthe following example, html_text() gives HELLO GOODBYE.

I want to get just GOODBYE. How can I get it?

<div class="joke">
  <div class="div_inside">
    <div class="title_inside">
      <a class="link" href="sompage.htm">HELLO</a>
    </div>
  </div>
  GOODBYE
</div>

解决方案

Try to grab the main div tag with class "joke" without picking up its children, using xpath:

library(rvest)

read_html('your_html_script') %>%
    html_nodes(xpath = '//div[@class="joke"]/node()[not(self::div)]') %>% 
    html_text()

Thanks!

这篇关于Rvest:获取节点文本而不是其子文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Rvest:获取节点文本而不是其子文本 [英] Rvest: getting node text and not its childen's text

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Rvest:获取节点文本而不是其子文本 [英] Rvest: getting node text and not its childen&#39;s text

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Rvest:获取节点文本而不是其子文本 [英] Rvest: getting node text and not its childen's text

登录关闭