无法以表格形式从日期内容中获取日期 [英] Failed to grab dates in a cutomized manner out of a tabular content

查看:84
本文介绍了无法以表格形式从日期内容中获取日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用python结合selenium编写了一个脚本,以解析网页表中可用的一些日期.该表位于标题NPL Victoria Betting Odds下.表格数据位于ID tournamentTable之内.您可以在10 Aug 201811 Aug 201812 Aug 2018中看到三个日期.我希望根据下面的预期输出对其进行解析和排列.

I've written a script in python in combination with selenium to parse some dates available within a table in a webpage. The table is located under the header NPL Victoria Betting Odds. The tabular data are within the id tournamentTable. You can see the three dates there 10 Aug 2018,11 Aug 2018 and 12 Aug 2018. I wish to parse and arrange them according to my expected output below.

网页链接

这是我到目前为止的尝试:

This is my attempt so far:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

link = "find the link above"

def get_content(driver,url):
    driver.get(url)
    for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))):
        try:
            idate = items.find_element_by_css_selector("th span[class^='datet']").text
        except Exception: idate = ""
        try:
            itime = items.find_element_by_css_selector("td.table-time").text
        except Exception: itime = ""

        print(f'{idate}--{itime}')

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    try:
        get_content(driver,link)
    finally:
        driver.quit()

目前,我的输出如下:

--
10 Aug 2018--
--
--09:30
--10:15
11 Aug 2018--
--
--05:00
--05:00
--09:00
12 Aug 2018--
--
--06:00
--06:00

我的预期输出:

10 Aug 2018--09:30
10 Aug 2018--10:15
11 Aug 2018--05:00
11 Aug 2018--05:00
11 Aug 2018--09:00
12 Aug 2018--06:00
12 Aug 2018--06:00

推荐答案

尝试使用以下代码:

def get_content(driver,url):
    driver.get(url)
    dates = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr.center.nob-border"))))
    for d in range(dates):
        item = driver.find_elements_by_css_selector("#tournamentTable tr.center.nob-border")[d]
        try:
            idate = item.find_element_by_css_selector("th span[class^='datet']").text
        except Exception: idate = ""
        for time_td in item.find_elements_by_xpath(".//following::td[contains(@class, 'table-time') and not((preceding::tr[@class='center nob-border'])[%d])]" % (d + 2)):
            try:
                itime = time_td.text
            except Exception: itime = ""
            print(f'{idate}--{itime}')

这篇关于无法以表格形式从日期内容中获取日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆