Rvest:获取节点文本而不是其子文本 [英] Rvest: getting node text and not its childen's text
本文介绍了Rvest:获取节点文本而不是其子文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
方法html_text()
(来自R Package rvest)连接节点的文本和它的所有子节点.我想只提取父亲的文字.
对于下面的例子,html_text()
给出了 HELLO GOODBYE.
我只想得到再见.我怎样才能得到它?
<div class="div_inside"><div class="title_inside"><a class="link" href="sompage.htm">HELLO</a>
再见
解决方案
尝试使用 class
"joke" 抓住主要的 div
标签而不拿起它的孩子,使用路径:
库(rvest)read_html('your_html_script') %>%html_nodes(xpath = '//div[@class="joke"]/node()[not(self::div)]') %>%html_text()
谢谢!
The method html_text()
(from R Package rvest) concatenates the text of the node and all its children. I would like to extract only the father's text.
Forthe following example, html_text()
gives HELLO GOODBYE.
I want to get just GOODBYE. How can I get it?
<div class="joke">
<div class="div_inside">
<div class="title_inside">
<a class="link" href="sompage.htm">HELLO</a>
</div>
</div>
GOODBYE
</div>
解决方案
Try to grab the main div
tag with class
"joke" without picking up its children, using xpath:
library(rvest)
read_html('your_html_script') %>%
html_nodes(xpath = '//div[@class="joke"]/node()[not(self::div)]') %>%
html_text()
Thanks!
这篇关于Rvest:获取节点文本而不是其子文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文