Datalab不会填充bigQuery表 [英] Datalab does not populate bigQuery tables
问题描述
我想将一个表的结果写入一个bigQuery表中,但它不起作用,任何人说使用insert_data(dataframe)函数,但它不填充我的表。
为了简化问题,我尝试读取一个表并将其写入刚创建的表(具有相同的模式),但它不起作用。任何人都可以告诉我我在哪里错了吗?
import gcp
import gcp.bigquery as bq
$
df = bq.Query('SELECT 1 as a,2 as b FROM [publicdata:samples.wikipedia] LIMIT 3')。to_dataframe()
#创建数据集并提取模式
dataset = bq.DataSet('prova1')
dataset.create(friendly_name ='aaa',description ='bbb')
schema = bq.Schema.from_dataframe(df)
创建表
temptable = bq.Table('prova1.prova2')。create(schema = schema,overwrite = True)
#我尝试将相同的数据放入刚创建的$ b $ temptable.insert_data(df)
调用insert_data将执行HTTP POST并返回一次。但是,数据显示在BQ表中可能需要一段时间(最长可达几分钟)。尝试在使用表格之前等一会儿。我们或许可以在以后的更新中解决这个问题,看到这个
直到现在准备就绪的黑客方式应该是这样的:
进口时间
而真:
info = temptable._api.tables_get(temptable._name_parts)
如果'streamingBuffer'不在info:
break
如果info ['streamingBuffer' ] ['estimatedRows']> 0:
break
time.sleep(5)
Hi I have a problem while using ipython notebooks on datalab.
I want to write the result of a table into a bigQuery table but it does not work and anyone says to use the insert_data(dataframe) function but it does not populate my table. To simplify the problem I try to read a table and write it to a just created table (with the same schema) but it does not work. Can anyone tell me where I am wrong?
import gcp
import gcp.bigquery as bq
#read the data
df = bq.Query('SELECT 1 as a, 2 as b FROM [publicdata:samples.wikipedia] LIMIT 3').to_dataframe()
#creation of a dataset and extraction of the schema
dataset = bq.DataSet('prova1')
dataset.create(friendly_name='aaa', description='bbb')
schema = bq.Schema.from_dataframe(df)
#creation of the table
temptable = bq.Table('prova1.prova2').create(schema=schema, overwrite=True)
#I try to put the same data into the temptable just created
temptable.insert_data(df)
Calling insert_data will do a HTTP POST and return once that is done. However, it can take some time for the data to show up in the BQ table (up to several minutes). Try wait a while before using the table. We may be able to address this in a future update, see this
The hacky way to block until ready right now should be something like:
import time
while True:
info = temptable._api.tables_get(temptable._name_parts)
if 'streamingBuffer' not in info:
break
if info['streamingBuffer']['estimatedRows'] > 0:
break
time.sleep(5)
这篇关于Datalab不会填充bigQuery表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!