没有嵌套节点.如何获取一条信息然后分别获取其他信息? [英] No nested nodes. How to get one piece of information and then to get additional info respectively?

查看:77
本文介绍了没有嵌套节点.如何获取一条信息然后分别获取其他信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于下面的代码,我需要分别获取日期和时间+ hrefs + formats + ...(未显示).

For the code below I need to get dates and their times+hrefs+formats+...(not shown) respectively.

<div class="showtimes">
    <h2>The Little Prince</h2>

    <div class="poster" data-poster-url="http://www.test.com">
        <img src="http://www.test.com">
    </div>

    <div class="showstimes">

        <div class="date">9 December, Wednesday</div>
        <span class="show-time techno-3d">
            <a href="http://www.test.com" class="link">12:30</a>
            <span class="show-format">3D</span>
        </span>

        <span class="show-time techno-3d">
            <a href="http://www.test.com" class="link">15:30</a>
            <span class="show-format">3D</span>
        </span>

        <span class="show-time techno-3d">
            <a href="http://www.test.com" class="link">18:30</a>
            <span class="show-format">3D</span>
        </span>


        <div class="date">10 December, Thursday</div>
        <span class="show-time techno-2d">
            <a href="http://www.test.com" class="link">12:30</a>
            <span class="show-format">2D</span>         
        </span>

        <span class="show-time techno-3d">
            <a href="http://www.test.com" class="link">15:30</a>
            <span class="show-format">3D</span>
        </span>
    </div>
</div>

为此,我使用以下代码(python).

To do this, I use this code (python).

for dates in movie.xpath('.//div[@class="showstimes"]/div[@class="date"]'):
    date = dates.xpath('.//text()')[0]

    # for times in dates.xpath('//following-sibling::span[1 = count(preceding-sibling::div[1] | (.//div[@class="date"])[1])]'):
    # for times in dates.xpath('//following-sibling::span[contains(@class,"show-time")]'):
    # for times in dates.xpath('.//../span[contains(@class,"show-time")]'):
    # for times in dates.xpath('//following-sibling::span[preceding-sibling::div[1][.="date"]]'):
        time = times.xpath('.//a/text()')[0]
        url = times.xpath('.//a/@href')[0]
        format_type = times.xpath('.//span[@class="show-format"]/text()')[0]

获取日期不是问题,但是我有一个问题,如何分别获取特定日期的其余信息.尝试了许多不同的方式-运气不佳(在其中一些评论中).当我需要的节点一个接一个(在同一级别上)时,我找不到解决该情况的方法.在这种情况下:

To get dates is not a problem, but I have a problem how to get the rest info for particular date respectively. Tried many different ways - no luck (in comments some of them). I can't find the way how to deal with the case when the nodes that I need are one under another (on the same level?). In this case:

-> div Date1
-> span Time1
-> span href1
-> span Format1

-> span Time2
-> span href2
-> span Format2

-> span Time3
-> span href3
-> span Format3

-> div Date2
-> span Time1
-> span href1
-> span Format1
# etc etc

推荐答案

事实证明,lxml支持从XPath表达式中引用python变量,这被证明对这种情况很有用,例如,对于每个div date,您都可以获取后继同级span,其中最近的同级同级div date是当前div date,其中对当前div date的引用存储在python变量中 dates:

Turns out that lxml support referencing python variable from XPath expression, which proven to be useful for this case i.e for every div date, you can get the following sibling span which the nearest preceding sibling div date is the current div date, where reference to the current div date is stored in python variable dates :

for dates in movie.xpath('.//div[@class="showstimes"]/div[@class="date"]'):
    date = dates.xpath('normalize-space()')
    for times in dates.xpath('following-sibling::span[preceding-sibling::div[1]=$current]', current=dates):
        time = times.xpath('a/text()')[0]
        url = times.xpath('a/@href')[0]
        format_type = times.xpath('span/text()')[0]
        print date, time, url, format_type

输出:

'9 December, Wednesday', '12:30', 'http://www.test.com', '3D'
'9 December, Wednesday', '15:30', 'http://www.test.com', '3D'
'9 December, Wednesday', '18:30', 'http://www.test.com', '3D'
'10 December, Thursday', '12:30', 'http://www.test.com', '2D'
'10 December, Thursday', '15:30', 'http://www.test.com', '3D'

参考文献:

  • https://stackoverflow.com/a/17750629/2998271
  • http://lxml.de/xpathxslt.html#the-xpath-method

这篇关于没有嵌套节点.如何获取一条信息然后分别获取其他信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆