Pandas / Google BigQuery:模式不匹配导致上传失败 [英] Pandas/Google BigQuery: Schema mismatch makes the upload fail

查看:299
本文介绍了Pandas / Google BigQuery:模式不匹配导致上传失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  price_datetime:DATETIME,
符号:STRING,
bid_open:FLOAT,
bid_high:FLOAT,
bid_low:FLOAT,
bid_close:FLOAT,
ask_open:FLOAT,
ask_high:FLOAT,
ask_low:FLOAT,
ask_close:FLOAT

在执行 pandas.read_gbq 使用列dtypes得到一个 dataframe

  price_datetime对象
符号对象
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype:object

现在我想使用 to_gbq ,所以我从这些dtypes转换我的本地数据框(我刚刚创建的):

  price_datetime datetime64 [ns] 
符号对象
bid_open float64
bid_high float64
bid_low float64
bid_close float64 $ b $ ask_open float64 $ b $ ask_high float64
ask_low float64
ask_close float64
dtype:object

到这些dtype:

  price_datetime对象
符号object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype:object



p



  df ['price_datetime'] = df ['price_datetime']。astype(object)

现在我(th墨水)我阅读使用 to_gbq 所以我这样做:

  import pandas 
pandas.io.gbq.to_gbq(df,< table_name> ;,< project_name> ;, if_exists ='append')

但我得到错误:

  ---------- -------------------------------------------------- --------------- 
InvalidSchema Traceback(最近的最后一次调用)
< ipython-input-15-d5a3f86ad382> in< module>()
1 a = time.time()
----> 2 pandas.io.gbq.to_gbq(df,< table_name> ;,< project_name> if_exists ='append')
3 b = time.time()
4
5 print (ba)

C:\用户\\\\应用程序\\ Local \ Continuum \Anconda3\lib\site-packages\pandas\io\gbq.py in_gbq(dataframe,destination_table,project_id,chunksize,verbose,reauth,if_exists,private_key)
825 elif if_exists =='append':
826 if not connector.verify_schema(dataset_id,table_id,table_schema):
- > 827 raise InvalidSchema(请验证DataFrame中的结构和
828数据类型是否与目标表的
829模式匹配。)

InvalidSchema:请验证DataFrame中的结构和数据类型是否与目标表的模式匹配。


解决方案

我必须做两件事来解决问题我。首先,我删除了我的表并重新将它重新加载为 TIMESTAMP 类型而不是 DATETIME 类型的列。这确保了在列类型为 datetime64 [ns] pandas.DataFrame 上载时使用 to_gbq ,它将 datetime64 [ns] 转换为 TIMESTAMP 类型和不是 DATETIME 键入(现在 )。



我做的第二件事是从 pandas 0.19 升级到 pandas 0.20 。这两件事解决了我的模式不匹配问题。

The schema in my google table looks like this:

price_datetime : DATETIME,
symbol         : STRING,
bid_open       : FLOAT,
bid_high       : FLOAT,
bid_low        : FLOAT,
bid_close      : FLOAT,
ask_open       : FLOAT,
ask_high       : FLOAT,
ask_low        : FLOAT,
ask_close      : FLOAT

After I do a pandas.read_gbq I get a dataframe with column dtypes like this:

price_datetime     object
symbol             object
bid_open          float64
bid_high          float64
bid_low           float64
bid_close         float64
ask_open          float64
ask_high          float64
ask_low           float64
ask_close         float64
dtype: object

Now I want to use to_gbq so I convert my local dataframe (which I just made) from these dtypes:

price_datetime    datetime64[ns]
symbol                    object
bid_open                 float64
bid_high                 float64
bid_low                  float64
bid_close                float64
ask_open                 float64
ask_high                 float64
ask_low                  float64
ask_close                float64
dtype: object

to these dtypes:

price_datetime     object
symbol             object
bid_open          float64
bid_high          float64
bid_low           float64
bid_close         float64
ask_open          float64
ask_high          float64
ask_low           float64
ask_close         float64
dtype: object

by doing:

df['price_datetime'] = df['price_datetime'].astype(object)

Now I (think) I am read to use to_gbq so I do:

import pandas
pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')

but I get the error:

---------------------------------------------------------------------------
InvalidSchema                             Traceback (most recent call last)
<ipython-input-15-d5a3f86ad382> in <module>()
      1 a = time.time()
----> 2 pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
      3 b = time.time()
      4 
      5 print(b-a)

C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
    825         elif if_exists == 'append':
    826             if not connector.verify_schema(dataset_id, table_id, table_schema):
--> 827                 raise InvalidSchema("Please verify that the structure and "
    828                                     "data types in the DataFrame match the "
    829                                     "schema of the destination table.")

InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.

解决方案

I had to do two things that solved the issue for me. First, I deleted my table and reuploaded it with the columns as TIMESTAMP types rather than DATETIME types. This made sure that the schema matched when the pandas.DataFrame with column type datetime64[ns] was uploaded to using to_gbq, which converts datetime64[ns] to TIMESTAMP type and not to DATETIME type (for now).

The second thing I did was upgrade from pandas 0.19 to pandas 0.20. These two things solved my problem of a schema mismatch.

这篇关于Pandas / Google BigQuery:模式不匹配导致上传失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆