JSON模式 - BigQuery - 我的模式会是什么(从我的JSON文件)?我想不明白 [英] JSON Schema - BigQuery - What would my schema be (from my JSON file)? I cannot figure it out

查看:138
本文介绍了JSON模式 - BigQuery - 我的模式会是什么(从我的JSON文件)?我想不明白的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Google Cloud BigQuery - 通过JSON文件加载数据 我试图将数据加载到BigQuery中(JSON换行符分隔符)来自JSON文件
我陷入困境试图弄清楚我的Schema是/我应该使用什么?



这个JSON文件是一个产品文件。



到目前为止我已经尝试了...
注意:对于 ONE 产品(很多),这是 JUST ,那么它会为所有其他产品重复相同的模式:

  [{sku:INTEGER,name:STRING,type:STRING,price:FLOAT,upc: INTEGER,category:[{id:STRING,name:STRING},{id:STRING,name:STRING},{id STRING,name:STRING},{id:STRING,name:STRING}],shipping:FLOAT,description:STRING,manufacturer STRING,model:STRING,url:STRING,image:STRING}] 

注意:图片键是图片的网址

除非有另一种方式...
有没有一种方法可以将JSON文件加载到BigQuery中并让它自动生成表格和数据集?

解决方案

如果您使用的是CLI工具,这是你的数据模式:

  [{name:sku,type:INT64 ,mode:NULLABLE},
{name:name,type:STRING,mode:NULLABLE},
{name: type,type:STRING,mode:NULLABLE},
{name:price,type:FLOAT,mode:NULLABLE
{name:upc,type:STRING,mode:NULLABLE},
{fields:
[{name: id,type:STRING,mode:NULLABLE},{name:name,type:STRING,mode:NULLABLE}],
name:category,type:RECORD,mode:REPEATED},
{name:shipping,type:FLOAT :NULLABLE},
{name:description,type:STRING,mode:NULLABLE},
{name:manufacturer键入:STRING,mode:NULLABLE},
{name:model,type:STRING,mode:NULLABLE {name:url,type:STRING,mode:NULLABLE},
{name:image,type:STRING :NULLABLE}]

您可以将其保存在一个文件(例如schema.json)中,然后执行命令

  bq load --source_format = NEWLINE_DELIMITED_JSON dataset_id.test_table path / to / json_data path / to / schema.json 

其中路径/ to / json_data 是数据的路径。它可以是本地计算机上的路径(例如 /documents/so/jsondata.json ),也可以是Google云端存储中的路径,例如 gs://分析/ b>
$ b

架构必须位于您的本地机器中,或者在命令行中指定,但在此操作中它必须指定



现在您在我的第一个回答中提到了有关BigQuery不需要模式的一类操作的注释。



您只能为联合表格,也就是使用外部文件作为参考创建的表格(这些文件通常位于GCS或Google Drive中)。



为此,您首先必须在GCS中拥有您的JSON数据,然后您必须在BQ中创建表。使用CLI,此命令使用来自GCS的JSON数据作为源创建联邦表:

  bq mk --external_table_definition = @ NEWLINE_DELIMITED_JSON = gs://bucket_name/jsondata.json dataset_id.table_test 

这个命令行没有架构指定和BQ尽最大努力找到它应该给予的数据(我用您的数据测试,它工作得很好,但我可以使用只有遗留SQL后)。

<请记住,这个过程不能保证一直工作,只有当这些表满足您的项目需求时,才应该使用联合表,否则在BQ中加载这些数据然后运行从那里查询。在我建议的第二个参考文献中,您可以详细了解何时最好使用联合表。


Google Cloud BigQuery - Load Data via JSON file

I am trying to load data into BigQuery (JSON Newline Delimited) from a JSON file. I'm getting stuck trying to figure out what my "Schema" is/ which I should be using?

The JSON file, is a file of products.

What I have tried so far... NOTE: This is JUST for ONE product (of many), then it repeats the same pattern for all the other products:

[{"sku": INTEGER,"name": "STRING", "type": "STRING", "price": FLOAT, "upc": "INTEGER", "category": [{"id": "STRING", "name": "STRING"}, {"id": "STRING", "name": "STRING"}, {"id": "STRING", "name": "STRING"}, {"id": "STRING", "name": "STRING"}], "shipping": FLOAT, "description": "STRING", "manufacturer": "STRING", "model":"STRING", "url": "STRING","image": "STRING"}]

NOTE: the "image" key, is a URL to the image

UNLESS THERE IS ANOTHER WAY... Is there a way to load the JSON file into BigQuery and have it "auto-generate" the table and dataset?

解决方案

If you are using the CLI tool, then this is the schema for your data:

[{"name": "sku", "type": "INT64", "mode": "NULLABLE"},
   {"name": "name", "type": "STRING", "mode": "NULLABLE"},
   {"name": "type", "type": "STRING", "mode": "NULLABLE"},
   {"name": "price", "type": "FLOAT", "mode": "NULLABLE"},
   {"name": "upc", "type": "STRING", "mode": "NULLABLE"},
   {"fields":
     [{"name": "id", "type": "STRING", "mode": "NULLABLE"}, {"name": "name", "type": "STRING", "mode": "NULLABLE"}],
    "name": "category", "type": "RECORD", "mode": "REPEATED"},
   {"name": "shipping", "type": "FLOAT", "mode": "NULLABLE"},
   {"name": "description", "type": "STRING", "mode": "NULLABLE"},
   {"name": "manufacturer", "type": "STRING", "mode": "NULLABLE"},
   {"name": "model", "type": "STRING", "mode": "NULLABLE"},
   {"name": "url", "type": "STRING", "mode": "NULLABLE"},
   {"name": "image", "type": "STRING", "mode": "NULLABLE"}]

You can save it in a file (such as "schema.json") and then run the command:

bq load --source_format=NEWLINE_DELIMITED_JSON dataset_id.test_table path/to/json_data path/to/schema.json

Where path/to/json_data is the path for your data. It can be either a path in your local machine (such as /documents/so/jsondata.json or it can also be a path in Google Cloud Storage, such as gs://analyzes/json_data.json for instance).

The schema must be in your local machine or specified along the command line but in this operation, it has to be specified.

Now you mentioned in the comments for my first answer about a type of operation where BigQuery does not require schemas.

You can do so indeed only for federated tables, that is, tables that are created using as reference an external file (and these files usually are in GCS or Google Drive).

To do so, you first would have to have your JSON data in GCS for instance and then you'd have to create the table in BQ. Using the CLI, this command creates the federated table using as source the JSON data from GCS:

bq mk --external_table_definition=@NEWLINE_DELIMITED_JSON=gs://bucket_name/jsondata.json dataset_id.table_test 

This command line does not have the schema specified and BQ does its best to find what it should be given the data (I tested with your data and it worked just fine but I could use only legacy SQL afterwards).

Keep in mind though that this process is not guaranteed to work all the times and also you should use federated tables only if such tables meet the requirements for your project, otherwise it's easier and faster to load this data inside of BQ and then run queries from there. In the second reference that I suggested, you can read more about when it's best to use federated tables.

这篇关于JSON模式 - BigQuery - 我的模式会是什么(从我的JSON文件)?我想不明白的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆