如何使用条件联接两个表可能包含正则表达式条件或数组条件 [英] how to join two tables with condition may contains regex condition or array condition
本文介绍了如何使用条件联接两个表可能包含正则表达式条件或数组条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个表tab1和tab2,数据如下所示
I have two tables tab1 and tab2 and the data like as follows
tab1:
tab2:
tab1和tab2中的项目描述不匹配,是否有任何方法可以将这两个表结合起来以获取客户ID
Here item description in tab1 and tab2 is not matching is there any way to join these two tables to fetch the customer ids
谢谢
推荐答案
尝试以下
#standardSQL
CREATE TEMPORARY FUNCTION similarity(Text1 STRING, Text2 STRING)
RETURNS FLOAT64
LANGUAGE js AS """
var _extend = function(dst) {
var sources = Array.prototype.slice.call(arguments, 1);
for (var i=0; i<sources.length; ++i) {
var src = sources[i];
for (var p in src) {
if (src.hasOwnProperty(p)) dst[p] = src[p];
}
}
return dst;
};
var Levenshtein = {
get: function(str1, str2) {
// base cases
if (str1 === str2) return 0;
if (str1.length === 0) return str2.length;
if (str2.length === 0) return str1.length;
// two rows
var prevRow = new Array(str2.length + 1),
curCol, nextCol, i, j, tmp;
// initialise previous row
for (i=0; i<prevRow.length; ++i) {
prevRow[i] = i;
}
// calculate current row distance from previous row
for (i=0; i<str1.length; ++i) {
nextCol = i + 1;
for (j=0; j<str2.length; ++j) {
curCol = nextCol;
// substution
nextCol = prevRow[j] + ( (str1.charAt(i) === str2.charAt(j)) ? 0 : 1 );
// insertion
tmp = curCol + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// deletion
tmp = prevRow[j + 1] + 1;
if (nextCol > tmp) {
nextCol = tmp;
}
// copy current col value into previous (in preparation for next iteration)
prevRow[j] = curCol;
}
// copy last col value into previous (in preparation for next iteration)
prevRow[j] = nextCol;
}
return nextCol;
}
};
var the_Text1;
try {the_Text1 = decodeURI(Text1).toLowerCase();} catch (ex) {the_Text1 = Text1.toLowerCase();}
try {the_Text2 = decodeURI(Text2).toLowerCase();} catch (ex) {the_Text2 = Text2.toLowerCase();}
return Levenshtein.get(the_Text1, the_Text2) / the_Text1.length;
""";
SELECT *, (
SELECT t1.Item_description
FROM `project.dataset.tab1` t1
ORDER BY similarity(t2.Item_description, REPLACE(t1.Item_description, '|', ', '))
LIMIT 1
) matched_description
FROM `project.dataset.tab2` t2
如果适用于您的问题的样本数据-结果将为
If to apply to sample data from your question - result will be
Row Customer_ld Item_description matched_description
1 1001 Item Lenovo x1 Yoga, i7 14" is delivered Lenovo x1 Yoga|i7 14"
2 1002 Lenovo x1 Yoga, i5 13" is delivered to customer Lenovo x1 Yoga|i5 13"
3 1003 Lenovo Yoga, i7 14" is delivered to customer@1003 Lenovo Yoga|i7 14"
4 1004 Item lenovo x1 yoga, i7 14" is delivered successfully Lenovo x1 Yoga|i7 14"
5 1005 Item Lenovo x1 Yoga, i7 14" is delivered@1005 Lenovo x1 Yoga|i7 14"
这篇关于如何使用条件联接两个表可能包含正则表达式条件或数组条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文