在 pandas 中加载通用的Google电子表格 [英] Loading a generic Google Spreadsheet in Pandas

查看:101
本文介绍了在 pandas 中加载通用的Google电子表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试在熊猫中加载Google Spreadsheet

When I try to load a Google Spreadsheet in pandas

from StringIO import StringIO  
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=<some_long_code>&output=csv')
data = r.content
df = pd.read_csv(StringIO(data), index_col=0)

我得到以下信息:

CParserError: Error tokenizing data. C error: Expected 1316 fields in line 73, saw 1386

为什么?我认为可以识别出包含数据的电子表格行和列,并将电子表格的行和列分别用作数据框的索引和列(NaN表示空白).为什么会失败?

Why? I would think that one could identify the spreadsheet set of rows and columns with data and use the spreadsheets rows and columns as the dataframe index and columns respectively (with NaN for anything empty). Why does it fail?

推荐答案

我的问题显示了正如其中一位评论员所指出的那样,您没有要求输入CSV格式的数据,而是在网址末尾提出了修改"请求 您可以使用此代码,并在电子表格上看到它的工作方式(顺便说一下,该电子表格必须是公开的.)也可以进行私有工作表,但这是另一个主题.

As one of the commentators noted you have not asked for the data in CSV format you have the "edit" request at the end of the url You can use this code and see it work on the spreadsheet (which by the way needs to be public..) It is possible to do private sheets as well but that is another topic.

from StringIO import StringIO  # got moved around in python3 if you're using that.

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

用于获取csv输出的新Google电子表格url格式为

The new Google spreadsheet url format for getting the csv output is

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

好吧,现在您需要再次更改网址格式:

Well they changed the url format slightly again now you need:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=0 #for the 1st sheet

我还发现我需要执行以下操作来对Python 3进行以上修改:

I also found I needed to do the following to deal with Python 3 a slight revision to the above:

from io import StringIO 

并获取文件:

guid=0 #for the 1st sheet
act = requests.get('https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&gid=%s' % guid)
dataact = act.content.decode('utf-8') #To convert to string for Stringio
actdf = pd.read_csv(StringIO(dataact),index_col=0,parse_dates=[0], thousands=',').sort()

actdf现在是带有标题(列名)的完整的熊猫数据框

actdf is now a full pandas dataframe with headers (column names)

这篇关于在 pandas 中加载通用的Google电子表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆