使用LIKE RETURNS";UPDATE/MERGE的BigQuery UPDATE表对于每个目标行&Quot;最多只能匹配一个源行 [英] bigquery update table using LIKE returns "UPDATE/MERGE must match at most one source row for each target row"
问题描述
查询错误:每个更新/合并最多只能匹配一个源行 目标行
表1(数据表)
textWithFoundItemInIt | foundItem
---------------------------------
hallo Adam |
Bert says hello |
Want to find "Caesar"bdjehg |
表2(映射表)
mappingItem
------------
Adam
Bert
Caesar
预期结果
textWithFoundItemInIt | foundItem
---------------------------------
hallo Adam | Adam
Bert says hello | Bert
Want to find "Caesar"bdjehg | Caesar
查询:
UPDATE `table1`
SET foundItem= mt.mappingItem
FROM `mappingTable` mt
WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');
UPDATE `table1`
SET foundItem= mt.mappingItem
FROM `mappingTable` mt
WHERE INSTR(textWithFoundItemInIt , mt.mappingItem) >1;
UPDATE `table1`
SET foundItem = (SELECT mt.mappingItem FROM `table2` AS mt
WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%')
)
WHERE TRUE;
UPDATE `table1`
SET foundItem= mt.mappingItem
FROM `table1`
inner join `table2` mt on textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');
我还从表1和表2中删除了所有重复值,但仍然是相同的错误消息。我还尝试使用JOIN语句,但得到以下错误:FROM子句中的别名Table1已定义为更新目标&Quot;
我在SO中发现了这些类似的问题,并尝试使用他们的方法:
- update columns values with column of another table based on condition
- Using one table's values to query another table in BigQuery
- SQL update from one Table to another based on a ID match
- How to efficiently select records matching substring in another table using BigQuery?
不幸的是,它们对解决我的问题没有帮助。所以我认为这不是重复的问题。
非常感谢您的想法。
跟进问题
我指的是@jon发布的解决方案。再次感谢您的帮助。但是,在使用不同的数据进行测试后,仍然存在"表1"中有重复的情况下无法工作的问题。 当然,这个问题来自‘group by’语句-如果没有这个语句,UPDATE查询就不能工作,从而导致我的原始问题中所述的错误消息。如果我对每个值进行分组,它也不起作用。
但是,在我的‘Table1’(数据)和我的映射表‘Table2’中可以有重复。所以说得非常准确,这是我的目标:
表1(数据表)
textWithFoundItemInIt | foundItem
-------------------------------------------
hallo Adam |
Bert says hello |
Bert says byebye |
Want to find "Caesar"bdjehg |
Want to find "Caesar"bdjehg |
Want to find "Caesar"again |
Want to find "CaesarCaesar"again and again | <== This is no problem, just finding one Caesar is enough
表2(映射表)
mappingItem
------------
Adam
Bert
Caesar
Bert
Caesar
Adam
预期结果
textWithFoundItemInIt | foundItem
--------------------------------------------
hallo Adam | Adam
Bert says hello | Bert
Bert says byebye | Bert
Want to find "Caesar"bdjehg | Caesar
Want to find "Caesar"bdjehg | Caesar
Want to find "Caesar"again | Caesar
Want to find "CaesarCaesar"again and again | Caesar
找到表2中的哪个ADAM并将其插入表1并不重要,它们将是相同的。因此,如果第一个ADAM将被第二个ADAM覆盖,或者一旦找到一个ADAM,查询只是停止以进一步搜索,也是可以的。
如果我执行Jon的‘SELECT’查询,结果将是:
textWithFoundItemInIt | foundItem
--------------------------------------------
hallo Adam | Adam
Bert says hello | Bert
Bert says byebye | Bert
Want to find "Caesar"bdjehg | Caesar
Want to find "Caesar"again | Caesar
Want to find "CaesarCaesar"again and again | Caesar
它(正确地)省略了要再次查找";Caesar";的第二个";,但不幸的是,这不是我需要的。
如果比较容易,在一行中找到两个名称也可以
textWithFoundItemInIt | foundItem
---------------------------------------------
hallo Adam and Bert | Adam, Bert
Bert says hello to Caesar | Bert, Caesar
或
textWithFoundItemInIt | foundItem1 | foundItem2
---------------------------------------------------------------
hallo Adam and Bert | Adam | Bert
Bert says hello to Caesar | Bert | Caesar
我希望这有助于理解我的问题。简而言之:";它只是一个具有多个相等行的映射&;-)
非常感谢:)
推荐答案
您的逻辑没有防范这种情况:
mappingItem
-----------
item1
item12
因为模式%item1%
将同时匹配item1
和item12
。有很多方法可以避免这种情况,这取决于您希望如何在结构不良的数据中处理这些问题。但这就是原因。
您可以使用以下内容查找问题:
SELECT table1.textWithFoundItemInIt
, COUNT(*)
FROM table1
JOIN table2
ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
GROUP BY table1.textWithFoundItemInIt
HAVING COUNT(*) > 1
决定如何处理这些情况后,您应该能够选择要在UPDATE
中使用的匹配选项。
基本上,请确保逻辑将要分配的值列表(每表1行)限制为一(1)个值。
这里有一种方法。我不确定BigQuery是否支持此特定形式。但它显示了一种合乎逻辑的方法。
查看数据,请注意,有一个以上的mappingItem
与table1
行匹配:
SELECT table1.textWithFoundItemInIt
, COUNT(*)
, MIN(table2.mappingItem) AS theItem1
, MAX(table2.mappingItem) AS theItem2
FROM table1
JOIN table2
ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
GROUP BY table1.textWithFoundItemInIt
HAVING COUNT(*) > 1
;
+-----------------------+----------+----------+----------+
| textWithFoundItemInIt | COUNT(*) | theItem1 | theItem2 |
+-----------------------+----------+----------+----------+
| Item12 is a problem | 2 | item1 | item12 |
+-----------------------+----------+----------+----------+
现在调整UPDATE
,以便在分配新值时按MIN(mappingItem)
table1
行选择MIN(mappingItem)
行:
UPDATE table1
JOIN ( SELECT textWithFoundItemInIt
, MIN(mappingItem) AS mappingItem
FROM table1
JOIN table2
ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
GROUP BY table1.textWithFoundItemInIt
) mt
ON table1.textWithFoundItemInIt = mt.textWithFoundItemInIt
SET foundItem = mt.mappingItem
;
查看结果:
SELECT * FROM table1;
+----------------------------+-----------+
| textWithFoundItemInIt | foundItem |
+----------------------------+-----------+
| hallo Item1 | item1 |
| Item2 says hello | item2 |
| Item12 is a problem | item1 |
| Want to find "Item3"bdjehg | item3 |
+----------------------------+-----------+
注意:这将根据原始请求更新所有目标行,甚至是问题行。可以将其调整为仅触及尚未设置foundItem
WHERE foundItem IS NULL
的那些行。
这篇关于使用LIKE RETURNS";UPDATE/MERGE的BigQuery UPDATE表对于每个目标行&Quot;最多只能匹配一个源行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!