从给定的网页中提取特定的列 [英] Extract specific columns from a given webpage

查看：100 发布时间：2020/9/20 8:44:49 pandas beautifulsoup bs4

本文介绍了从给定的网页中提取特定的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用python阅读网页并将数据保存为csv格式，以作为pandas数据框导入.

I am trying to read web page using python and save the data in csv format to be imported as pandas dataframe.

我有以下代码从所有页面中提取链接，相反，我正在尝试读取某些列字段.

I have the following code that extracts the links from all the pages, instead I am trying to read certain column fields.

for i in range(10):
    url='https://pythonexpress.in/workshop/'+str(i).zfill(3)
    import urllib2
    from bs4 import BeautifulSoup
    try:
        page = urllib2.urlopen(url).read()
        soup = BeautifulSoup(page)
        for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: 
            print i, anchor.text
    except:
        pass

我可以将这9列另存为pandas数据框吗?

Can I save these 9 columns as pandas dataframe?

df.columns=['Organiser', 'Instructors', 'Date', 'Venue', 'Level', 'participants', 'Section', 'Status', 'Description']

推荐答案

这会返回前10页的正确结果-但是100页会花费很多时间.有什么建议可以使其更快?

This returns the correct results for the first 10 pages - but it takes a lot of time for 100 pages. Any suggestions to make it faster?

import urllib2
from bs4 import BeautifulSoup

finallist=list()
for i in range(10):
    url='https://pythonexpress.in/workshop/'+str(i).zfill(3)
    try:
        page = urllib2.urlopen(url).read()
        soup = BeautifulSoup(page)
        mylist=list()
        for anchor in soup.find_all('div', {'class':'col-xs-8'})[:9]: 
            mylist.append(anchor.text)
        finallist.append(mylist)
    except:
        pass

import pandas as pd
df=pd.DataFrame(finallist)

df.columns=['Organiser', 'Instructors', 'Date', 'Venue', 'Level', 'participants', 'Section', 'Status', 'Description']

df['Date'] = pd.to_datetime(df['Date'],infer_datetime_format=True)
df['participants'] = df['participants'].astype(int)

这篇关于从给定的网页中提取特定的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从给定的网页中提取特定的列 [英] Extract specific columns from a given webpage

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从给定的网页中提取特定的列 [英] Extract specific columns from a given webpage

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭