Pandas / Google BigQuery:模式不匹配导致上传失败 [英] Pandas/Google BigQuery: Schema mismatch makes the upload fail
问题描述
price_datetime:DATETIME,
符号:STRING,
bid_open:FLOAT,
bid_high:FLOAT,
bid_low:FLOAT,
bid_close:FLOAT,
ask_open:FLOAT,
ask_high:FLOAT,
ask_low:FLOAT,
ask_close:FLOAT
在执行 pandas.read_gbq
使用列dtypes得到一个 dataframe
:
price_datetime对象
符号对象
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype:object
现在我想使用 to_gbq
,所以我从这些dtypes转换我的本地数据框(我刚刚创建的):
price_datetime datetime64 [ns]
符号对象
bid_open float64
bid_high float64
bid_low float64
bid_close float64 $ b $ ask_open float64 $ b $ ask_high float64
ask_low float64
ask_close float64
dtype:object
到这些dtype:
price_datetime对象
符号object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype:object
p
df ['price_datetime'] = df ['price_datetime']。astype(object)
现在我(th墨水)我阅读使用 to_gbq
所以我这样做:
import pandas
pandas.io.gbq.to_gbq(df,< table_name> ;,< project_name> ;, if_exists ='append')
但我得到错误:
---------- -------------------------------------------------- ---------------
InvalidSchema Traceback(最近的最后一次调用)
< ipython-input-15-d5a3f86ad382> in< module>()
1 a = time.time()
----> 2 pandas.io.gbq.to_gbq(df,< table_name> ;,< project_name> if_exists ='append')
3 b = time.time()
4
5 print (ba)
C:\用户\\\\应用程序\\ Local \ Continuum \Anconda3\lib\site-packages\pandas\io\gbq.py in_gbq(dataframe,destination_table,project_id,chunksize,verbose,reauth,if_exists,private_key)
825 elif if_exists =='append':
826 if not connector.verify_schema(dataset_id,table_id,table_schema):
- > 827 raise InvalidSchema(请验证DataFrame中的结构和
828数据类型是否与目标表的
829模式匹配。)
InvalidSchema:请验证DataFrame中的结构和数据类型是否与目标表的模式匹配。
我必须做两件事来解决问题我。首先,我删除了我的表并重新将它重新加载为 TIMESTAMP
类型而不是 DATETIME
类型的列。这确保了在列类型为 datetime64 [ns]
的 pandas.DataFrame
上载时使用 to_gbq
,它将 datetime64 [ns]
转换为 TIMESTAMP
类型和不是 DATETIME
键入(现在 )。
我做的第二件事是从 pandas 0.19
升级到 pandas 0.20
。这两件事解决了我的模式不匹配问题。
The schema in my google table looks like this:
price_datetime : DATETIME,
symbol : STRING,
bid_open : FLOAT,
bid_high : FLOAT,
bid_low : FLOAT,
bid_close : FLOAT,
ask_open : FLOAT,
ask_high : FLOAT,
ask_low : FLOAT,
ask_close : FLOAT
After I do a pandas.read_gbq
I get a dataframe
with column dtypes like this:
price_datetime object
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
Now I want to use to_gbq
so I convert my local dataframe (which I just made) from these dtypes:
price_datetime datetime64[ns]
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
to these dtypes:
price_datetime object
symbol object
bid_open float64
bid_high float64
bid_low float64
bid_close float64
ask_open float64
ask_high float64
ask_low float64
ask_close float64
dtype: object
by doing:
df['price_datetime'] = df['price_datetime'].astype(object)
Now I (think) I am read to use to_gbq
so I do:
import pandas
pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
but I get the error:
---------------------------------------------------------------------------
InvalidSchema Traceback (most recent call last)
<ipython-input-15-d5a3f86ad382> in <module>()
1 a = time.time()
----> 2 pandas.io.gbq.to_gbq(df, <table_name>, <project_name>, if_exists='append')
3 b = time.time()
4
5 print(b-a)
C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\io\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
825 elif if_exists == 'append':
826 if not connector.verify_schema(dataset_id, table_id, table_schema):
--> 827 raise InvalidSchema("Please verify that the structure and "
828 "data types in the DataFrame match the "
829 "schema of the destination table.")
InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table.
I had to do two things that solved the issue for me. First, I deleted my table and reuploaded it with the columns as TIMESTAMP
types rather than DATETIME
types. This made sure that the schema matched when the pandas.DataFrame
with column type datetime64[ns]
was uploaded to using to_gbq
, which converts datetime64[ns]
to TIMESTAMP
type and not to DATETIME
type (for now).
The second thing I did was upgrade from pandas 0.19
to pandas 0.20
. These two things solved my problem of a schema mismatch.
这篇关于Pandas / Google BigQuery:模式不匹配导致上传失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!