拒绝尝试将数据加载到BigQuery以获取现有数据 [英] Reject data load attempt to BigQuery for existing data
问题描述
我正在使用pandas-gbq包将数据从熊猫数据框加载到BigQuery:
I'm loading data from pandas dataframes to BigQuery using pandas-gbq package:
df.to_gbq('dataset.table', project_id, reauth=False, if_exists='append')
典型的数据框如下:
key | value | order
"sd3e" | 0.3 | 1
"sd3e" | 0.2 | 2
"sd4r" | 0.1 | 1
"sd4r" | 0.5 | 2
如果键已经出现在BigQuery表中,是否可以拒绝加载尝试?
Is there a way to reject the loading attemp if the key already appears in the BigQuery table?
推荐答案
如果键已出现在BigQuery表中,是否可以拒绝加载尝试?
Is there a way to reject the loading attempt if the key already appears in the BigQuery table?
否,因为BigQuery不像其他数据库那样支持键. 有2个典型的用例可以解决此问题:
No, since BigQuery doesn't support keys in a similar way other database does. There are 2 typical use-cases to solve this:
选项1:
上载带有timeStamp的数据,并使用merge命令删除重复项
Option 1:
Upload the data with a timeStamp and use a merge command to remove duplicates
请参见此链接为此,这是一个例子
See this link on how to do this, This is an example
MERGE `DATA` AS target
USING `DATA` AS source
ON target.key = source.key
WHEN MATCHED AND target.ts < source.ts THEN
DELETE
注意:在这种情况下,您需要为合并扫描付费,但保持表行唯一.
Note: In this case, you pay for the merge scanning but keep your table row unique.
选项2:
使用时间戳上传数据,并使用ROW_NUMBER
窗口函数获取最新记录,这是数据示例:
Upload the data with a timestamp and use ROW_NUMBER
window function to fetch the latest record, This is an example with your data:
WITH DATA AS (
SELECT 'sd3e' AS key, 0.3 as value, 1 as r_order, '2019-04-14 00:00:00' as ts UNION ALL
SELECT 'sd3e' AS key, 0.2 as value, 2 as r_order, '2019-04-14 01:00:00' as ts UNION ALL
SELECT 'sd4r' AS key, 0.1 as value, 1 as r_order, '2019-04-14 00:00:00' as ts UNION ALL
SELECT 'sd4r' AS key, 0.5 as value, 2 as r_order, '2019-04-14 01:00:00' as ts
)
SELECT *
FROM (
SELECT * ,ROW_NUMBER() OVER(PARTITION BY key order by ts DESC) rn
FROM `DATA`
)
WHERE rn = 1
这将产生预期结果,如下所示:
This produces the expected results as follow:
注意:这种情况不会产生额外的费用,但是,从表中获取数据时,您始终必须确保使用窗口功能
Note: This case doesn't incur extra charges, however, you always have to make sure to use window function when fetching from the table
这篇关于拒绝尝试将数据加载到BigQuery以获取现有数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!