使用levenshtein距离比率比较2条记录 [英] using levenshtein distance ratio to compare 2 records

查看:59
本文介绍了使用levenshtein距离比率比较2条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用levenshtein距离和比率源代码创建了mysql用户函数.我正在比较2条记录,并基于75%的匹配度来选择记录.

I've created the mysql user function using the levenshtein distance and ratio source codes. I am comparing 2 records and based on a 75% match I want to select the record.

  1. 订单进入带有 ITEM 标题
  2. 的表 paypal_ipn_orders
  3. 对表 itemkey 执行查询,以在名为 ITEM 的记录中找到75%的匹配项
  4. 如果匹配了75%的标题,则会将表 itemkey 中的八位数字分配给表 paypal_ipn_orders
  1. Order comes into table paypal_ipn_orders with an ITEM title
  2. A query executes against a table itemkey to find a 75% match in a record called ITEM as well
  3. if a 75% title is match it assigns an eight digit number from table itemkey to table paypal_ipn_orders

这是查询

      UPDATE paypal_ipn_orders
      SET sort_num = (SELECT sort_id
      FROM itemkey
      WHERE  levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) > 75)
      WHERE packing_slip_printed = 0 
      AND LOWER(payment_status) = 'completed' 
      AND address_name <> ''
      AND shipping < 100

我对此进行了几次调整,但是在levenshtein_ratio部分的第4行和第5行之间失败了.如果成功,则说明子查询返回多个行.我不知道如何解决它,使之返回正确的结果,我只是失去了对如何使这项工作.

I have adjusted this a few times but it's failing between line 4 and 5 at the levenshtein_ratio part. If it works it says that the subquery returns more than one row. I don't know how to fix it to make it return the correct result, I just lost as to how to make this work.

推荐答案

SET上的子查询只能返回一个值.如果itemkey有一个以上项占item_name的75%,您要怎么做?以下将使用最佳匹配之一:

A subquery on a SET should only return one value. If itemkey has more than one item that is 75% of item_name what do you want to do? The below will use one of the best matches:

UPDATE paypal_ipn_orders
SET sort_num = (SELECT sort_id
  FROM itemkey
  WHERE  levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) > 75
  ORDER BY levenshtein_ratio(itemkey.item, paypal_ipn_orders.item_name) DESC
  LIMIT 1)
WHERE packing_slip_printed = 0 
  AND LOWER(payment_status) = 'completed' 
  AND address_name <> ''
  AND shipping < 100

这篇关于使用levenshtein距离比率比较2条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆