我在 pandas 上使用to_gbq来更新Google BigQuery并获取GenericGBQException [英] I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

查看:112
本文介绍了我在 pandas 上使用to_gbq来更新Google BigQuery并获取GenericGBQException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试使用 to_gbq 更新Google BigQuery表时,我得到以下答复:

While trying to use to_gbq for updating Google BigQuery table, I get a response of:

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

我的代码:

gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)

和我的mini_df数据框看起来像:

and my dataframe of mini_df looks like:

date    request_number  name    feature_name    value_name  value
2018-01-10  1   1   "a" "b" 0.309457
2018-01-10  1   1   "c" "d" 0.273748

虽然我正在运行 to_gbq ,但是BigQuery上没有任何表,但是我可以看到该表是使用下一个架构创建的:

While I'm running the to_gbq, and there's no table on the BigQuery, I can see that the table is created with the next schema:

日期STRING为空
request_number STRING NULLABLE
名称STRING NULLABLE
feature_name STRING为空
value_name STRING NULLABLE
值FLOAT NULLABLE

date STRING NULLABLE
request_number STRING NULLABLE
name STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
value FLOAT NULLABLE

我做错了什么?我该如何解决?

What am I doing wrong? How can I solve this?

P.S,其余例外:

BadRequest                                Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
    589                         destination_table,
--> 590                         job_config=job_config).result()
    591                 except self.http_error as ex:

~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
    527         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528         return super(_AsyncJob, self).result(timeout=timeout)
    529 

~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    110             # Pylint doesn't recognize that this is valid in this case.
--> 111             raise self._exception
    112 

BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

During handling of the above exception, another exception occurred:

GenericGBQException                       Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
    106                       chunksize=chunksize,
    107                       verbose=verbose, reauth=reauth,
--> 108                       if_exists=if_exists, private_key=private_key)

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
    987         table.create(table_id, table_schema)
    988 
--> 989     connector.load_data(dataframe, dataset_id, table_id, chunksize)
    990 
    991 

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
    590                         job_config=job_config).result()
    591                 except self.http_error as ex:
--> 592                     self.process_http_error(ex)
    593 
    594                 rows = []

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
    454         # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
    455 
--> 456         raise GenericGBQException("Reason: {0}".format(ex))
    457 
    458     def run_query(self, query, **kwargs):

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

推荐答案

我遇到了同样的问题.

就我而言,它取决于数据帧的数据类型object.

In my case it depended on the data type object of the Data Frame.

我有三列externalIdmappingIdinfo.对于那些字段,我都没有设置数据类型,而让熊猫做到了.

I've had three columns externalId, mappingId, info. For none of those fields I set a data type and let pandas do it's magic.

决定将所有三种列数据类型都设置为object.问题是,内部to_gbq组件使用to_json组件.由于某种原因或其他原因,如果字段的类型为object,但仅包含数值,则此输出将省略数据字段周围的引号.

It decided to set all three column data types to object. Problem is, internally the to_gbq component uses the to_json component. For some reason or another this output omits the quotes around the data field if the type of the field is object but holds only numerical values.

所以Google Big Query需要这个

So Google Big Query needed this

{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}

但是得到了:

{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}

并且由于该字段的映射在Google Big Query中为STRING,因此导入过程失败.

And because the mapping of the field was STRING in Google Big Query, the import process failed.

提出了两种解决方案.

解决方案1-更改列的数据类型

简单的类型转换有助于解决此问题.我还不得不将Big Query中的数据类型更改为INTEGER.

A simple type conversion helped with this issue. I also had to change the data type in Big Query to INTEGER.

df['externalId'] = df['externalId'].astype('int')

在这种情况下,Big Query可以使用JSON标准所说的不带引号的字段.

If this is the case, Big Query can consume fields without quotes as the JSON standard says.

解决方案2-确保字符串字段是字符串

同样,这是在设置数据类型.但是由于我们将其显式设置为String,所以使用to_json导出将打印出带引号的字段,并且一切正常.

Again, this is setting the data type. But since we set it explicitly to String, the export with to_json prints out a quoted field and everything worked fine.

df['externalId'] = df['externalId'].astype('str')

这篇关于我在 pandas 上使用to_gbq来更新Google BigQuery并获取GenericGBQException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆