从以下网站解析表 [英] Parsing a Table from the following website

查看:57
本文介绍了从以下网站解析表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想收集2016年印度每一天的过去天气详细信息.以下网站提供了此数据:

I want to collect the past weather details of a particular city in India for each day in the year 2016.The following website has this data :

" https://www.timeanddate. com/weather/india/kanpur/historic?month = 1& year = 2016 "

此链接包含2016年1月的数据.那里有一张漂亮的桌子

This link has the data for month January 2016. There is a nice table out there

我要提取此表

我已经尝试了足够多,可以提取出另一个表.但是我不想要这个.它没有达到我的目的

我希望另一个大表具有随时间给出的数据. 对于该月的每一天",因为这样我就可以使用该网址遍历所有月份.

I want the other big table with data given with time. "For each day of that month" because then I can loop over all months using the URL.

问题是我不知道html及其相关的东西.因此,我无法自己清除所有内容.

The problem is I do not know html and stuffs related to it. So I am not able to scrape out things myself.

推荐答案

如果您提供了一些尝试的代码,那就更好了.无论如何,此代码适用于1月1日表.您也可以编写循环以提取其他日期的数据.

It would have been better if you had provided some codes that you tried. Anyway, this code works for the 1st Jan table. You can write the loop to extract data for other days as well.

from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://www.timeanddate.com/weather/india/kanpur/historic?
month=1&year=2016"
page = urlopen(url)
soup = BeautifulSoup(page, 'lxml')

Data = []
table = soup.find('table', attrs={'id':'wt-his'})
for tr in table.find('tbody').find_all('tr'):
   dict = {}
   dict['time'] = tr.find('th').text.strip()
   all_td = tr.find_all('td')
   dict['temp'] = all_td[1].text
   dict['weather'] = all_td[2].text
   dict['wind'] = all_td[3].text
   arrow = all_td[4].text
   if arrow == '↑':
      dict['wind_dir'] = 'South to North'
   else: 
      dict['wind_dir'] = 'North to South'

   dict['humidity'] = all_td[5].text
   dict['barometer'] = all_td[6].text
   dict['visibility'] = all_td[7].text

   Data.append(dict)

注意:为wind_dir逻辑添加其他情况

Note: add other cases for the wind_dir logic

这篇关于从以下网站解析表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆