如何在Big Query中更新表格并存储替换后的原始值及其与新值的差异 [英] How update a table in Big Query and store the original value replaced and her difference with the new value
问题描述
In a previous post, see here: How update a table in Big Query where the name of fields to update are values in another table
我有两个表和以下问题:
I had two table and the following problem:
表1:
+-------+------------+---------+
| ID | field_name | value |
+-------+------------+---------+
| 1 | usd | 10.08 |
| 1 | gross_amt | 52.0 |
| 1 | jpy | 30.05 |
| 2 | usd | 50.0 |
| 2 | eur | 50.0 |
| 3 | real_amt | 210.43 |
| 3 | total | 320 |
| 4 | jpy | 23.45 |
| 4 | name | john |
| 4 | city | utah |
+-------+------------+---------+
表2:
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| ID | name | last_name | date1 | counrty | city | usd | eur | jpy | gross_amt | real_amt | total | ... | field200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| 1 | jane | doe | 19900108 | usa | LA | 9.08 | 0.00 | 29.05 | 50.0 | 52.0 | 900.0 | ... | value200 |
| 2 | lane | smith | 19900108 | usa | LA | 40.8 | 40.0 | 0.00 | 100.0 | 70.0 | 290.0 | ... | value200 |
| 3 | mike | hoffa | 19900108 | usa | SF | 5.05 | 0.00 | 0.00 | 10.0 | 25.0 | 100.0 | ... | value200 |
| 4 | paul | doe | 19900108 | usa | NY | 1.00 | 0.00 | 29.05 | 45.0 | 55.0 | 110.0 | ... | value200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
需要用表1列value
的值更新表2中列field_name
中表1中的字段的值,两个表中的两个ID相同,除此之外,表1中列value
的数据类型是字符串,但表2中要更新的列的数据类型是不同的,尤其是数字(数字,int64,float64)
Need to update the values of the fields in the table 2, which are in table 1 in column field_name
, with the values of table 1 column value
, both IDs are the same in both tables, beside that, the datatype of column value
in table 1 are string, but the data type of the columns to update in the table 2 are diferent, especially the numbers(numeric, int64, float64)
上面的表是一个示例,实际问题的表2有200个字段,表1中的ID最多可以对每天要修改的数千条记录进行40个值修改
The tables above are an example, table 2 of the real problem has 200 fields and in table 1 for an ID there can be up to 40 value modifications for thousands of records to be modified daily
现在我有一个新问题
在新表1中(表3):
表3:
+-------+------------+---------+-----------------------+------------------+
| ID | field_name | value | value_replaced_table2 | diff |
+-------+------------+---------+-----------------------+------------------+
| 1 | usd | 10.08 | 9.08 | abs(10.08-9.08) |
| 1 | gross_amt | 52.0 | 50.0 | abs(52.0-50.0) |
| 1 | jpy | 30.05 | 29.05 | abs(30.05-29.05) |
| 2 | usd | 50.0 | 40.08 | abs(50.0-40.0) |
| 2 | eur | 50.0 | 40.0 | ...... |
| 3 | real_amt | 210.43 | 25.0 | ...... |
| 3 | total | 320 | 100.0 | ...... |
| 4 | jpy | 23.45 | 29.05 | abs(23.45-29.05) |
| 4 | name | john | paul | john |
| 4 | city | utah | NY | utah |
+-------+------------+---------+-----------------------+------------------+
我需要在表value_replaced_table2
中的新表1(表3)中插入表2中替换的值,从而将替换后的值存储在上面的表2中,并计算两个值之间的差值(新值要更新并替换表2中的旧值,请注意,新表1(表3)中的数据类型为字符串,表2中的数据类型为(数字,int64,float64)
I need insert into new table 1 (table 3) in the column value_replaced_table2
the value replaced in the table 2, thus storing the replaced value in table 2 above, and calculate the differences between both values, (the new value to update and the old value replaced in table 2, note that the data types in the new table 1 (table 3) are string, and those in table 2 are (numeric, int64, float64)
从现在开始,谢谢您的回答!
From now, thanks for your answers!
推荐答案
使用先前创建的pivot1
表,可以在使用table2
执行最终的MERGE
以获得将要更改的old_values
之前使用它
Using the previously created pivot1
table, you can use it before executing the final MERGE
with table2
to get the old_values
that will change.
然后,您将需要取消旋转结果以获得table3
.使用样本数据的示例
:
Then, you will need to unpivot the results to get table3
. Example using your sample data
:
-- Same pivot1 table as before
EXECUTE IMMEDIATE '''
CREATE TEMP TABLE pivot1 AS
SELECT id, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(field_name = '" || field_name || "', CAST(value AS " || data_type || "), NULL)) AS " || field_name)
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''
FROM `project.dataset.table1`
GROUP BY id
''';
-- Table3
EXECUTE IMMEDIATE '''
CREATE OR REPLACE TABLE project.dataset.table3 AS
SELECT a.id, values.column_name as field_name, values.new_value as value, values.old_value as value_replaced_table2,
CASE
WHEN values.data_type = "STRING" THEN values.new_value
WHEN values.data_type = "INT64" THEN CAST(ABS(CAST(values.new_value AS INT64) - CAST(values.old_value AS INT64)) AS STRING)
ELSE CAST(ABS(CAST(values.new_value AS FLOAT64) - CAST(values.old_value AS FLOAT64)) AS STRING)
END as diff
FROM (
SELECT t1.id, [''' || (
SELECT STRING_AGG(DISTINCT "STRUCT('" || column_name || "' as column_name, CAST(t1." || column_name || " AS STRING) as new_value, CAST(t2." || column_name || " AS STRING) as old_value, '" || data_type || "' as data_type)")
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''] AS values
FROM `project.dataset.table2` AS t2
JOIN pivot1 AS t1
ON t2.id = t1.id ) a
CROSS JOIN UNNEST(a.values) as values
WHERE values.new_value IS NOT NULL
''';
SELECT * FROM `project.dataset.table3` ORDER BY id;
请注意,将FLOAT64差异强制转换为STRING时将获得一个近似值,因此,如果要舍入该差异,可以使用强制转换为
Notice that the FLOAT64 difference will get an approximate value when casted to STRING, so if you want to round up the difference, you can use a cast to NUMERIC instead of FLOAT64, e.g.:
...
ELSE CAST(ABS(CAST(values.new_value AS NUMERIC) - CAST(values.old_value AS NUMERIC)) AS STRING)
...
-- Instead of 9.20000000000000028, it will appear as 9.2
这篇关于如何在Big Query中更新表格并存储替换后的原始值及其与新值的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!