从“div"中抓取数据班级 [英] Scrape Data from "div" class

查看:26
本文介绍了从“div"中抓取数据班级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试并能够使用以下脚本从 td 类中抓取数据:

 nArticles <- getNodeSet(pagetree,"///*/td[@class='bg1 W1']///*/li[@class='LI2 font28 C bold W1']") #current价钱current.price <- xmlValue(nArticles[[1]])

现在我有一个像下面这样的网络资源:

<div style="float: left;"><ul class="BlockItemIndex" style="width:123px; height:92px">
  • 指数
  • <li class="I1" style="font:bold 20px Arial"><span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li><li class="I1" style="font:normal 15px Arial"><span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li>
  • <span class="font12">Turnover</span><span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li>
  • <div class="seperate"></div><div style="float: left;"><ul class="BlockItemChange" style="width:75px">

  • 高的
  • <span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li><ul class="BlockItemChange" style="width:75px; margin-top:2px;">
  • 低的
  • <span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>
  • <div class="seperate"></div><div style="float: left;"><ul class="BlockItemChange" style="width:75px">

  • 打开
  • <span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li><ul class="BlockItemChange" style="width:75px; margin-top:2px;">
  • 上一页 关闭
  • <span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>
  • 我需要接21,549.28,我尝试了以下操作:

    nArticles <- getNodeSet(pagetree,"///*/ul[@class='BlockItemChange']///*/li[@class='I2']")

    但是失败了.任何人都可以帮忙吗?谢谢.

    解决方案

    很难知道您使用什么来确定您感兴趣的值,但是

    query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'xpathSApply(xml,查询,xmlValue)

    挑选出至少有两个包含 span 元素的 li 元素的所有 BlockItemIndex 元素.由于所有 li 元素都具有相同的类,因此指定一个也无济于事.我不确定你想用 * 完成什么;我认为 // 是多余的.稍后在您的查询中, // 不是您想要的,您对 BlockItemClass 元素的直接后代感兴趣.

    I tried and am able scrape data from td class using the script below:

     nArticles <- getNodeSet(pagetree,"//*/td[@class='bg1 W1']//*/li[@class='LI2 font28 C bold W1']") #current price
     current.price <- xmlValue(nArticles[[1]])
    

    Now I have a websource like below:

    <div>
        <div style="float: left;">
                <ul class="BlockItemIndex" style="width:123px; height:92px">
                        <li class="font12 I1">
                                Index
                        </li>
                        <li class="I1" style="font:bold 20px Arial">
                                <span id="ctl00_ctl00_cphContent_cphContent_lblIndex">21,549.28</span></li>
                        <li class="I1" style="font:normal 15px Arial">
                                <span id="ctl00_ctl00_cphContent_cphContent_lblChange"><span class="pos bold">+70.56 (0.33%)</span></span></li>
                        <li class="I1">
                                <span class="font12">Turnover</span>&nbsp;<span id="ctl00_ctl00_cphContent_cphContent_lblTurnover">70.41B</span></li>
                </ul>
        </div>
        <div class="seperate"></div>
        <div style="float: left;">
                <ul class="BlockItemChange" style="width:75px">
                        <li class="font12 I1">
                                High
                        </li>
                        <li class="I2">
                                <span id="ctl00_ctl00_cphContent_cphContent_lblHigh">21,569.74</span></li>
                </ul>
                <ul class="BlockItemChange" style="width:75px; margin-top:2px;">
                        <li class="font12 I1">
                                Low
                        </li>
                        <li class="I2">
                                <span id="ctl00_ctl00_cphContent_cphContent_lblLow">21,302.19</span></li>
                </ul>
        </div>
        <div class="seperate"></div>
        <div style="float: left;">
                <ul class="BlockItemChange" style="width:75px">
                        <li class="font12 I1">
                                Open
                        </li>
                        <li class="I2">
                                <span id="ctl00_ctl00_cphContent_cphContent_lblOpen">21,339.02</span></li>
                </ul>
                <ul class="BlockItemChange" style="width:75px; margin-top:2px;">
                        <li class="font12 I1">
                                Prev Close
                        </li>
                        <li class="I2">
                                <span id="ctl00_ctl00_cphContent_cphContent_lblPreClose">21,478.72</span></li>
                </ul>
        </div>
    </div>
    

    I need to pick up 21,549.28, and I tried the following:

    nArticles <- getNodeSet(pagetree,"//*/ul[@class='BlockItemChange']//*/li[@class='I2']") 
    

    But fails. Can anyone help? Thanks.

    解决方案

    It's hard to know what you're using to determine the value you're interested in, but

    query = '//ul[@class="BlockItemIndex"]/li[2]/span/text()'
    xpathSApply(xml, query, xmlValue)
    

    picks out all BlockItemIndex elements that have at least two li elements containing a span element. Since all li elements have the same class, it doesn't help to specify one. I'm not sure what you were trying to accomplish with *; I think it's redundant with //. Later in your query, // isn't what you want, you're interested in immediate descendants of the BlockItemClass element.

    这篇关于从“div"中抓取数据班级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    相关文章
    其他开发最新文章
    热门教程
    热门工具
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆