从Wikipedia表中抓取数据 [英] scraping data from wikipedia table

查看：67 发布时间：2020/9/20 7:06:58 python pandas beautifulsoup wikipedia

本文介绍了从Wikipedia表中抓取数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我只是试图将Wikipedia表中的数据抓取到熊猫数据框中.

I'm just trying to scrape data from a wikipedia table into a panda dataframe.

我需要重现三列:邮政编码，自治市镇，附近地区".

I need to reproduce the three columns: "Postcode, Borough, Neighbourhood".

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'xml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = []
for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd
df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighbourhood'] = pd.Series(Neighbourhood)

df

它只返回自治市镇...

And it returns only the borough...

谢谢

推荐答案

如果您只希望脚本从页面中提取一张表，则您可能会想过这个问题.一次导入，一行，无循环:

You may be overthinking the problem, if you only want the script to pull one table from the page. One import, one line, no loops:

import pandas as pd
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df=pd.read_html(url, header=0)[0]

df.head()

    Postcode    Borough         Neighbourhood
0   M1A         Not assigned    Not assigned
1   M2A         Not assigned    Not assigned
2   M3A         North York      Parkwoods
3   M4A         North York      Victoria Village
4   M5A         Downtown Toronto    Harbourfront

这篇关于从Wikipedia表中抓取数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Wikipedia表中抓取数据 [英] scraping data from wikipedia table

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从Wikipedia表中抓取数据 [英] scraping data from wikipedia table

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭