在Windows上从Python制作Google BigQuery [英] Making a Google BigQuery from Python on Windows
问题描述
我正在尝试在其他数据服务中做一些非常简单的事情。我试图做一个相对简单的SQL查询并将其作为python中的数据框返回。我使用的是Windows 10并使用Phython 2.7(特别是Canopy 1.7.4)。通常这可以通过 pandas.read_sql_query ,但由于BigQuery的一些细节,它们需要不同的方法 pandas.io.gbq.read_gbq
这种方法可以正常工作,除非您想创建一个大查询。如果您在BigQuery上进行Big Query,您将收到错误消息:GenericGBQException:原因:responseTooLarge,错误代码为
消息:响应太大而无法返回。考虑在作业配置中将allowLargeResults设置为true。有关详情,请参阅 https://cloud.google.com/bigquery/troubleshooting-errors
与我的情况相关
Python BigQuery allowLargeResults和pandas.io.gbq
一个解决方案是针对python 3,因此它是一个非启动器。另一个错误是因为我无法将我的凭据设置为Windows环境变量。
ApplicationDefaultCredentialsError:Application Default Credentials不可用。如果在Google Compute Engine中运行,则它们可用。否则,必须定义环境变量GOOGLE_APPLICATION_CREDENTIALS,指向定义凭据的文件。请参阅 https://developers.google.com/accounts/docs/application-default -credentials 了解更多信息。 JSON凭证文件,我已经将它设置为一个环境变量,但我仍然知道如何处理上述错误。我需要用python以某种方式加载吗?它似乎在寻找它,但无法找到是正确的。在这种情况下,是否有一种特殊的方式将其设置为环境变量?
在 pd.read_gbq
函数中从传统到标准的默认方言。
pd.read_gbq(查询,'my-super-project',dialect ='standard')
确实,您可以在Big Query文档中阅读AllowLargeResults参数:
$ b
AllowLargeResults:对于标准SQL查询,此标志被忽略
并且总是允许大的结果。
I am trying to do something which is very simple in other data services. I am trying to make a relatively simple SQL query and return it as a dataframe in python. I am on Windows 10 and using Phython 2.7 (specifically Canopy 1.7.4)
Typically this would be done with pandas.read_sql_query but due to some specifics with BigQuery they require a different method pandas.io.gbq.read_gbq
This method works fine unless you want to make a Big Query. If you make a Big Query on BigQuery you get the error
GenericGBQException: Reason: responseTooLarge, Message: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
This was asked and answered before in this ticket but neither of the solutions are relevant for my case
Python BigQuery allowLargeResults with pandas.io.gbq
One solution is for python 3 so it is a nonstarter. The other is giving an error due to me being unable to set my credentials as an environment variable in windows.
ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
I was able to download the JSON credentials file and I have set it as an environment variable in the few ways I know how but I still get the above error. Do I need to load this in some way in python? It seems to be looking for it but unable to find is correctly. Is there a special way to set it as an environment variable in this case?
You can do it in Python 2.7 by changing the default dialect from legacy to standard in pd.read_gbq
function.
pd.read_gbq(query, 'my-super-project', dialect='standard')
Indeed, you can read in Big Query documentation for the parameter AllowLargeResults:
AllowLargeResults: For standard SQL queries, this flag is ignored and large results are always allowed.
这篇关于在Windows上从Python制作Google BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!