bigQuery Google云端硬盘使用googleSheetsOptions范围查询多个工作表 [英] bigQuery Google Drive query multiple sheets with googleSheetsOptions range

查看:107
本文介绍了bigQuery Google云端硬盘使用googleSheetsOptions范围查询多个工作表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个包含两个工作表的Google表格电子表格,分别称为Sheet1Sheet2:

I have created a Google Sheets spreadsheet with two worksheets, called Sheet1 and Sheet2:

https://docs.google.com/spreadsheets /d/1P1WH7cwVDaG6k-OQxKVXtnjBXI1NGFYvHD6IxCRFsZc/edit?usp = sharing

Sheet1在第1行中具有col1col2列,而Sheet2在第1行中具有col3col4列.

Sheet1 has columns col1 and col2 in row 1, and Sheet2 has columns col3 and col4 in row 1.

我希望能够使用命令行bigQuery界面查询两个工作表. 直到2018年5月添加了一项最新功能后,看来这是不可能的:

I would like to be able to query both sheets using the command-line bigQuery interface. It seems like this was not possible until a recent feature added in May 2018:

https://issuetracker.google.com/issues/35905674#comment12

我试图复制此评论所描述的内容,但无法使其正常工作.这是我到目前为止所做的:

I tried to replicate what this comment describes, but I can't get it to work. Here is what I've done so far:

新功能的描述显示了一个示例google_sheets_tabeledef.json,如下所示:

The description of the new feature shows an example google_sheets_tabeledef.json that looks like this:

{
  "autodetect": false,
  "sourceFormat": "GOOGLE_SHEETS", 
  "sourceUris": [
    "https://docs.google.com/spreadsheets/d/xxxxxxxxxxxxxx"
  ],
  "maxBadRecords": 1,
  "googleSheetsOptions":
  {
    "range": "test_sheet!A1:B20",
    "skipLeadingRows": 0
  },
  "schema" : {
    "fields": [
      {
        "name": "col1",
        "type": "string"
      },
      {
        "name": "col2",
        "type": "int64"
      },
    ]
  }
}

鉴于我的电子表格有两个工作表,分别是Sheet1Sheet2,我在bigQuery帐户中创建了各自的表, 然后我尝试通过命令行bq query来查询它们中的每一个.其工作方式如Sheet1所示:

Given my spreadsheet with two sheets, Sheet1 and Sheet2, I created their respective tables in my bigQuery account, then I tried to query each of them via the command-line bq query. It works as shown for Sheet1:

bq query --format=csv --use_legacy_sql=false 'SELECT * FROM `cegx-test-project1.multi_sheet_test.Sheet1`' 2>/dev/null

col1,col2
col1,col2
foo,bar
abc,def

如果我对Sheet2尝试相同的操作,则在没有external_table_definition的情况下,它会显示:

And if I try the same for Sheet2, in absence of the external_table_definition, it shows:

bq query --format=csv --use_legacy_sql=false 'SELECT * FROM `cegx-test-project1.multi_sheet_test.Sheet2`' 2>/dev/null

col3,col4
col1,col2
foo,bar
abc,def

因此,它基本上恢复为第一个工作表,而不是查询Sheet2.到目前为止,符合预期.

So it basically reverts to the first worksheet instead of querying Sheet2. So far, as expected.

然后我创建了一个等效的google_sheets_sheet2_tabledef.json文件,如下所示:

I have then created an equivalent google_sheets_sheet2_tabledef.json file like this:

{
  "autodetect": true, 
  "sourceFormat": "GOOGLE_SHEETS", 
  "sourceUris": [
    "https://docs.google.com/spreadsheets/d/1P1WH7cwVDaG6k-OQxKVXtnjBXI1NGFYvHD6IxCRFsZc"
  ],
  "maxBadRecords": 1,
  "googleSheetsOptions":
  {
    "range": "Sheet2!A1:B10",
    "skipLeadingRows": 0
  },
  "schema" : {
    "fields": [
{"name":"col3","type":"string"},
{"name":"col4","type":"string"}
    ]
  }
}

请注意,在range参数中我是如何指向Sheet2!A1:B10的.

Notice how in the range parameter, I am pointing to Sheet2!A1:B10.

如果我尝试再次查询该表,现在带有external_table_definition标志,我得到:

If I try to query the table again, now with the external_table_definition flag, I get:

bq query --external_table_definition="Sheet2::/tmp/google_sheets_sheet2_tabledef.json" --format=csv --use_legacy_sql=false 'SELECT * FROM `cegx-test-project1.multi_sheet_test.Sheet2`' 2>/dev/null                                             

col3,col4
col1,col2
foo,bar
abc,def

因此,即使外部定义文件指向Sheet2中的范围,我仍然可以从Sheet1中获得结果.

So I still get the results from Sheet1 even with the external definition file pointing to the range in Sheet2.

任何想法出在哪里?

推荐答案

就像您用来从临时外部表定义中查询的命令未引用外部表.

Looks like the command you used to query from an temporary external table definition was not referring to the external table.

如果将命令更改为以下命令,它应该可以按预期运行:

If you change the command to the following, it should work as expected:

bq查询--external_table_definition ="Sheet2 ::/tmp/google_sheets_sheet2_tabledef.json" --format = csv --use_legacy_sql = false'SELECT * FROM Sheet2'2>/dev/null

bq query --external_table_definition="Sheet2::/tmp/google_sheets_sheet2_tabledef.json" --format=csv --use_legacy_sql=false 'SELECT * FROM Sheet2' 2>/dev/null

这篇关于bigQuery Google云端硬盘使用googleSheetsOptions范围查询多个工作表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆