我在 pandas 上使用to_gbq来更新Google BigQuery并获取GenericGBQException [英] I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException
问题描述
当尝试使用 to_gbq 更新Google BigQuery表时,我得到以下答复:
While trying to use to_gbq for updating Google BigQuery table, I get a response of:
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
我的代码:
gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
和我的mini_df数据框看起来像:
and my dataframe of mini_df looks like:
date request_number name feature_name value_name value
2018-01-10 1 1 "a" "b" 0.309457
2018-01-10 1 1 "c" "d" 0.273748
虽然我正在运行 to_gbq ,但是BigQuery上没有任何表,但是我可以看到该表是使用下一个架构创建的:
While I'm running the to_gbq, and there's no table on the BigQuery, I can see that the table is created with the next schema:
日期STRING为空
request_number STRING NULLABLE
名称STRING NULLABLE
feature_name STRING为空
value_name STRING NULLABLE
值FLOAT NULLABLE
date STRING NULLABLE
request_number STRING NULLABLE
name STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
value FLOAT NULLABLE
我做错了什么?我该如何解决?
What am I doing wrong? How can I solve this?
P.S,其余例外:
BadRequest Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
589 destination_table,
--> 590 job_config=job_config).result()
591 except self.http_error as ex:
~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
527 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528 return super(_AsyncJob, self).result(timeout=timeout)
529
~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
110 # Pylint doesn't recognize that this is valid in this case.
--> 111 raise self._exception
112
BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
During handling of the above exception, another exception occurred:
GenericGBQException Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
106 chunksize=chunksize,
107 verbose=verbose, reauth=reauth,
--> 108 if_exists=if_exists, private_key=private_key)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
987 table.create(table_id, table_schema)
988
--> 989 connector.load_data(dataframe, dataset_id, table_id, chunksize)
990
991
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
590 job_config=job_config).result()
591 except self.http_error as ex:
--> 592 self.process_http_error(ex)
593
594 rows = []
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
454 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
455
--> 456 raise GenericGBQException("Reason: {0}".format(ex))
457
458 def run_query(self, query, **kwargs):
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
推荐答案
我遇到了同样的问题.
就我而言,它取决于数据帧的数据类型object
.
In my case it depended on the data type object
of the Data Frame.
我有三列externalId
,mappingId
,info
.对于那些字段,我都没有设置数据类型,而让熊猫做到了.
I've had three columns externalId
, mappingId
, info
. For none of those fields I set a data type and let pandas do it's magic.
决定将所有三种列数据类型都设置为object
.问题是,内部to_gbq
组件使用to_json
组件.由于某种原因或其他原因,如果字段的类型为object
,但仅包含数值,则此输出将省略数据字段周围的引号.
It decided to set all three column data types to object
. Problem is, internally the to_gbq
component uses the to_json
component. For some reason or another this output omits the quotes around the data field if the type of the field is object
but holds only numerical values.
所以Google Big Query需要这个
So Google Big Query needed this
{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}
但是得到了:
{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}
并且由于该字段的映射在Google Big Query中为STRING
,因此导入过程失败.
And because the mapping of the field was STRING
in Google Big Query, the import process failed.
提出了两种解决方案.
解决方案1-更改列的数据类型
简单的类型转换有助于解决此问题.我还不得不将Big Query中的数据类型更改为INTEGER
.
A simple type conversion helped with this issue. I also had to change the data type in Big Query to INTEGER
.
df['externalId'] = df['externalId'].astype('int')
在这种情况下,Big Query可以使用JSON标准所说的不带引号的字段.
If this is the case, Big Query can consume fields without quotes as the JSON standard says.
解决方案2-确保字符串字段是字符串
同样,这是在设置数据类型.但是由于我们将其显式设置为String
,所以使用to_json
导出将打印出带引号的字段,并且一切正常.
Again, this is setting the data type. But since we set it explicitly to String
, the export with to_json
prints out a quoted field and everything worked fine.
df['externalId'] = df['externalId'].astype('str')
这篇关于我在 pandas 上使用to_gbq来更新Google BigQuery并获取GenericGBQException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!