如何从付费墙后面将Excel文件下载到 pandas 数据框? [英] How to download a Excel file from behind a paywall into a pandas dataframe?

查看:66
本文介绍了如何从付费墙后面将Excel文件下载到 pandas 数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要登录才能访问数据的网站.

I have this website that requires log in to access data.

import pandas as pd
import requests

r = requests.get(my_url, cookies=my_cookies) # my_cookies are imported from a selenium session.
df = pd.io.excel.read_excel(r.content, sheetname=0)

回复:

IOError: [Errno 2] No such file or directory: 'Ticker\tAction\tName\tShares\tPrice\...

显然,str被作为文件名称处理.有没有办法将其作为文件处理?或者,我们可以将cookie传递给pd.get_html吗?

Apparently, the str is processed as a filename. Is there a way to process it as a file? Alternatively can we pass cookies to pd.get_html?

经过进一步处理,我们现在可以看到这实际上是一个csv文件.下载文件的内容为:

After further processing we can now see that this is actually a csv file. The content of the downloaded file is:

In [201]: r.content
Out [201]: 'Ticker\tAction\tName\tShares\tPrice\tCommission\tAmount\tTarget Weight\nBRSS\tSELL\tGlobal Brass and Copper Holdings Inc\t400.0\t17.85\t-1.00\t7,140\t0.00\nCOHU\tSELL\tCohu Inc\t700.0\t12.79\t-1.00\t8,953\t0.00\nUNTD\tBUY\tUnited Online Inc\t560.0\t15.15\t-1.00\t-8,484\t0.00\nFLXS\tBUY\tFlexsteel Industries Inc\t210.0\t40.31\t-1.00\t-8,465\t0.00\nUPRO\tCOVER\tProShares UltraPro S&P500\t17.0\t71.02\t-0.00\t-1,207\t0.00\n'

请注意,它是制表符分隔的.仍然尝试:

Notice that it is tab delimited. Still, trying:

# csv version 1
df = pd.read_csv(r.content) 
# Returns error, file does not exist. Apparently read_csv() is also trying to read it as a file.

# csv version 2
fh = io.BytesIO(r.content)
df = pd.read_csv(fh) # ValueError: No columns to parse from file.

# csv version 3
s = StringIO(r.content)
df = pd.read_csv(s)
# No error, but the resulting df is not parsed properly; \t's show up in the text of the dataframe.

推荐答案

只需将文件内容包装在

Simply wrap the file contents in a BytesIO:

with io.BytesIO(r.content) as fh:
    df = pd.io.excel.read_excel(fh, sheetname=0)

这篇关于如何从付费墙后面将Excel文件下载到 pandas 数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆