如何从<span>之间的html检索数据和&lt;/span&gt; [英] how to retrieve data from html between &lt;span&gt; and &lt;/span&gt;

查看:32
本文介绍了如何从<span>之间的html检索数据和&lt;/span&gt;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在亚马逊客户评论中获得 1 到 5 的比率.我检查了来源,发现这部分看起来像

<span style="margin-right:5px;"><span class="swSprite s_star_5_0 " title="5.0 颗星,共 5 颗星" ><span>5.0 颗星,共 5 颗星</span></跨度></span><span style="vertical-align:middle;"><b>Surface Pro 开箱即用</b>,<nobr>2013 年 10 月 5 日</nobr></span>;

我想从 5 颗星中获得 5.0 分

 5.0 星,最多 5 颗星</span>

我如何使用 xpathSApply 来获取它?

谢谢!

解决方案

我建议使用 selectr 包,它使用 css 选择器代替 xpath.

库(XML)doc <- htmlParse('<div style="margin-bottom:0.5em;"><span style="margin-right:5px;"><span class="swSprite s_star_5_0 " title="5.0 颗星,共 5 颗星" ><span>5.0 星,最多 5 颗星</span></span></span><span style="vertical-align:middle;"><b>Surface Pro 开箱即用</b>,<nobr>2013 年 10 月 5 日</nobr></span></div>', asText = TRUE)图书馆(选择器)xmlValue(querySelector(doc, 'div > span > span > span'))

更新:如果你想使用 xpath,你可以使用 selectr 中的 css_to_xpath 函数来找出合适的 xpath 命令,在这种情况下,结果是

"descendant-or-self::div/span/span/span"

I want to get the rate that is from 1 to 5 in amazon customer reviews. I check the source, and find this part looks as

<div style="margin-bottom:0.5em;">
    <span style="margin-right:5px;"><span class="swSprite s_star_5_0 " title="5.0 out of 5 stars" ><span>5.0 out of 5 stars</span></span> </span>
    <span style="vertical-align:middle;"><b>Works great right out of the box with Surface Pro</b>, <nobr>October 5, 2013</nobr></span>
  </div>

I want to get 5.0 out of 5 stars from

<span>5.0 out of 5 stars</span></span> </span>

how can i use xpathSApply to get it?

Thank you!

解决方案

I would recommend using the selectr package, which uses css selectors in place of xpath.

library(XML)
doc <- htmlParse('
  <div style="margin-bottom:0.5em;">
    <span style="margin-right:5px;">
     <span class="swSprite s_star_5_0 " title="5.0 out of 5 stars" >
      <span>5.0 out of 5 stars</span></span> </span>
     <span style="vertical-align:middle;">
     <b>Works great right out of the box with Surface Pro</b>, 
     <nobr>October 5, 2013</nobr></span>
  </div>', asText = TRUE
)

library(selectr)
xmlValue(querySelector(doc, 'div > span > span > span'))

UPDATE: If you are looking to use xpath, you can use the css_to_xpath function in selectr to figure out the appropriate xpath command, which in this case turns out to be

"descendant-or-self::div/span/span/span"

这篇关于如何从<span>之间的html检索数据和&lt;/span&gt;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆