如何使用条件联接两个表可能包含正则表达式条件或数组条件 [英] how to join two tables with condition may contains regex condition or array condition

查看:29
本文介绍了如何使用条件联接两个表可能包含正则表达式条件或数组条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表tab1和tab2,数据如下所示

I have two tables tab1 and tab2 and the data like as follows

tab1:

tab2:

tab1和tab2中的项目描述不匹配,是否有任何方法可以将这两个表结合起来以获取客户ID

Here item description in tab1 and tab2 is not matching is there any way to join these two tables to fetch the customer ids

谢谢

推荐答案

尝试以下

#standardSQL
CREATE TEMPORARY FUNCTION similarity(Text1 STRING, Text2 STRING)
RETURNS FLOAT64
LANGUAGE js AS """
  var _extend = function(dst) {
    var sources = Array.prototype.slice.call(arguments, 1);
    for (var i=0; i<sources.length; ++i) {
      var src = sources[i];
      for (var p in src) {
        if (src.hasOwnProperty(p)) dst[p] = src[p];
      }
    }
    return dst;
  };
  var Levenshtein = {
    get: function(str1, str2) {
      // base cases
      if (str1 === str2) return 0;
      if (str1.length === 0) return str2.length;
      if (str2.length === 0) return str1.length;
      // two rows
      var prevRow  = new Array(str2.length + 1),
          curCol, nextCol, i, j, tmp;
      // initialise previous row
      for (i=0; i<prevRow.length; ++i) {
        prevRow[i] = i;
      }
      // calculate current row distance from previous row
      for (i=0; i<str1.length; ++i) {
        nextCol = i + 1;
        for (j=0; j<str2.length; ++j) {
          curCol = nextCol;

          // substution
          nextCol = prevRow[j] + ( (str1.charAt(i) === str2.charAt(j)) ? 0 : 1 );
          // insertion
          tmp = curCol + 1;
          if (nextCol > tmp) {
            nextCol = tmp;
          }
          // deletion
          tmp = prevRow[j + 1] + 1;
          if (nextCol > tmp) {
            nextCol = tmp;
          }
          // copy current col value into previous (in preparation for next iteration)
          prevRow[j] = curCol;
        }
        // copy last col value into previous (in preparation for next iteration)
        prevRow[j] = nextCol;
      }
      return nextCol;
    }
  };

  var the_Text1;
  try {the_Text1 = decodeURI(Text1).toLowerCase();} catch (ex) {the_Text1 = Text1.toLowerCase();}
  try {the_Text2 = decodeURI(Text2).toLowerCase();} catch (ex) {the_Text2 = Text2.toLowerCase();}
  return Levenshtein.get(the_Text1, the_Text2) / the_Text1.length;
""";

SELECT *, (
  SELECT t1.Item_description
  FROM `project.dataset.tab1` t1
  ORDER BY similarity(t2.Item_description, REPLACE(t1.Item_description, '|', ', ')) 
  LIMIT 1
  ) matched_description
FROM `project.dataset.tab2` t2

如果适用于您的问题的样本数据-结果将为

If to apply to sample data from your question - result will be

Row Customer_ld Item_description                                        matched_description  
1   1001        Item Lenovo x1 Yoga, i7 14" is delivered                Lenovo x1 Yoga|i7 14"    
2   1002        Lenovo x1 Yoga, i5 13" is delivered to customer         Lenovo x1 Yoga|i5 13"    
3   1003        Lenovo Yoga, i7 14" is delivered to customer@1003       Lenovo Yoga|i7 14"   
4   1004        Item lenovo x1 yoga, i7 14" is delivered successfully   Lenovo x1 Yoga|i7 14"    
5   1005        Item Lenovo x1 Yoga, i7 14" is delivered@1005           Lenovo x1 Yoga|i7 14"    

这篇关于如何使用条件联接两个表可能包含正则表达式条件或数组条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆