从“div"中抓取数据班级 [英] Scrape Data from "div" class

查看：26 发布时间：2021/9/24 18:51:34 xml r web-scraping

本文介绍了从“div"中抓取数据班级的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试并能够使用以下脚本从 td 类中抓取数据:

 nArticles <- getNodeSet(pagetree,"///*/td[@class='bg1 W1']///*/li[@class='LI2 font28 C bold W1']") #current价钱current.price <- xmlValue(nArticles[[1]])

现在我有一个像下面这样的网络资源:


<div style="float: left;"><ul class="BlockItemIndex" style="width:123px; height:92px">指数
<li class="I1" style="font:bold 20px Arial"><span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li><li class="I1" style="font:normal 15px Arial"><span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li><span class="font12">Turnover</span><span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li>
<div class="seperate"></div><div style="float: left;"><ul class="BlockItemChange" style="width:75px">
高的
<span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li><ul class="BlockItemChange" style="width:75px; margin-top:2px;">
低的
<span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>

<div class="seperate"></div><div style="float: left;"><ul class="BlockItemChange" style="width:75px">

打开

<span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li><ul class="BlockItemChange" style="width:75px; margin-top:2px;">

上一页关闭

<span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>

我需要接21,549.28，我尝试了以下操作:

nArticles <- getNodeSet(pagetree,"///*/ul[@class='BlockItemChange']///*/li[@class='I2']")

但是失败了.任何人都可以帮忙吗?谢谢.

解决方案

很难知道您使用什么来确定您感兴趣的值，但是

query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'xpathSApply(xml，查询，xmlValue)

挑选出至少有两个包含 span 元素的 li 元素的所有 BlockItemIndex 元素.由于所有 li 元素都具有相同的类，因此指定一个也无济于事.我不确定你想用 * 完成什么；我认为 // 是多余的.稍后在您的查询中， // 不是您想要的，您对 BlockItemClass 元素的直接后代感兴趣.

I tried and am able scrape data from td class using the script below:

 nArticles <- getNodeSet(pagetree,"//*/td[@class='bg1 W1']//*/li[@class='LI2 font28 C bold W1']") #current price
 current.price <- xmlValue(nArticles[[1]])

Now I have a websource like below:

<div>
    <div style="float: left;">
            <ul class="BlockItemIndex" style="width:123px; height:92px">
                    <li class="font12 I1">
                            Index
                    </li>
                    <li class="I1" style="font:bold 20px Arial">
                            <span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li>
                    <li class="I1" style="font:normal 15px Arial">
                            <span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li>
                    <li class="I1">
                            <span class="font12">Turnover</span>&nbsp;<span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li>
            </ul>
    </div>
    <div class="seperate"></div>
    <div style="float: left;">
            <ul class="BlockItemChange" style="width:75px">
                    <li class="font12 I1">
                            High
                    </li>
                    <li class="I2">
                            <span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li>
            </ul>
            <ul class="BlockItemChange" style="width:75px; margin-top:2px;">
                    <li class="font12 I1">
                            Low
                    </li>
                    <li class="I2">
                            <span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>
            </ul>
    </div>
    <div class="seperate"></div>
    <div style="float: left;">
            <ul class="BlockItemChange" style="width:75px">
                    <li class="font12 I1">
                            Open
                    </li>
                    <li class="I2">
                            <span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li>
            </ul>
            <ul class="BlockItemChange" style="width:75px; margin-top:2px;">
                    <li class="font12 I1">
                            Prev Close
                    </li>
                    <li class="I2">
                            <span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>
            </ul>
    </div>
</div>

I need to pick up 21,549.28, and I tried the following:

nArticles <- getNodeSet(pagetree,"//*/ul[@class='BlockItemChange']//*/li[@class='I2']")

But fails. Can anyone help? Thanks.

解决方案

It's hard to know what you're using to determine the value you're interested in, but

query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'
xpathSApply(xml, query, xmlValue)

picks out all BlockItemIndex elements that have at least two li elements containing a span element. Since all li elements have the same class, it doesn't help to specify one. I'm not sure what you were trying to accomplish with *; I think it's redundant with //. Later in your query, // isn't what you want, you're interested in immediate descendants of the BlockItemClass element.

这篇关于从“div"中抓取数据班级的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

从“div"中抓取数据班级 [英] Scrape Data from "div" class

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从“div"中抓取数据班级 [英] Scrape Data from &quot;div&quot; class

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

从“div"中抓取数据班级 [英] Scrape Data from "div" class

登录关闭