从一个表中选择包含另一个表中一长串单词中任何单词的行 [英] Select rows from a table that contain any word from a long list of words in another table

查看:73
本文介绍了从一个表中选择包含另一个表中一长串单词中任何单词的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每个财富1000强公司都有一张桌子:

I have one table with every Fortune 1000 company name:

财富列表:

------------------------------------------------
|fid        |  coname                          |
------------------------------------------------
| 1         | 3m                               |
| 2         | Amazon                           |
| 3         | Bank of America                  |
| 999       | Xerox                            |
------------------------------------------------

我的新闻稿中有每个用户的第二张表:
我的用户:

I have a 2nd table with every user on my newsletter:
MyUsers:

------------------------------------------------
|uid    |  name      | companyname             |
------------------------------------------------
| 1350  | John Smith |  my own Co              |
| 2731  | Greg Jones |  Amazon.com, Inc        |
| 3899  | Mike Mars  |  Bank of America, Inc   |
| 6493  | Alex Smith |  Handyman America       |
------------------------------------------------

我如何选拔在财富1000强公司工作的每一个新闻订阅者? (通过扫描我的整个MyUsers表,查找与FortuneList表中的任何同名记录的每条记录)

How do I pull out every one of my newsletter subscribers that works for a Fortune 1000 company? (By scanning my entire MyUsers table for every record that has any of the coname's from the FortuneList table)

我希望输出拉出:

------------------------------------------------
|uid   |  name       | companyname             |
------------------------------------------------
| 2731  | Greg Jones |  Amazon.com, Inc        |
| 3899  | Mike Mars  |  Bank of America, Inc   |
------------------------------------------------

(查看如何在"Amazon.com,Inc"中间找到"Amazon")

(See how it finds "Amazon" in the middle of "Amazon.com, Inc")

推荐答案

如果您在Oracle中进行此操作,将产生所需的结果(带有示例数据):

If you were doing this in Oracle, this would yield your desired result (with the example data):

with    fortunelist as(
        select 1 as fid, '3m' as coname from dual union all
        select 2, 'Amazon' from dual union all
        select 3, 'Bank of America' from dual union all
        select 999, 'Xerox' from dual
        )
        , myusers as(
        select 1350 as usrid, 'John Smith' as name, 'my own Co' as companyname from dual union all
        select 2731, 'Greg Jones', 'Amazon.com, Inc.' from dual union all
        select 3899, 'Mike Mars', 'Bank of America, Inc' from dual union all
        select 6493, 'Alex Smith', 'Handyman America' from dual 
        )
select  utl_match.jaro_winkler_similarity(myusers.companyname, fortunelist.coname) as sim
        , myusers.companyname
        , fortunelist.coname
from    fortunelist
        , myusers
where   utl_match.jaro_winkler_similarity(myusers.companyname, fortunelist.coname) >= 80

原因是,您追逐的2的Jaro Winkler结果是87和95(分别是Amazon和BOA).您可以向上或向下增加查询中的80,以使匹配阈值更高或更低.您走得越高,将拥有的比赛越少,但比赛的可能性就越大.您走得越低,您将拥有更多的比赛,但您冒着将并非真正比赛的比赛带回来的风险.例如,美国杂工"与美国银行" = 73/100.因此,如果将其降低到70,则使用示例数据会得到误报. Jaro Winkler通常是用于人的名字,而不是公司名称,但是,由于公司名称通常也是很短的字符串,因此它可能仍然对您有用.

The reason being, the Jaro Winkler result for the 2 you're after are 87 and 95 (Amazon and BOA, respectively). You can bump the 80 in the query up or down to make the matching threshold higher or lower. The higher you go, the fewer matches you'll have, but the more likely they will be. The lower you go, the more matches you'll have, but you risk getting matches back that aren't really matches. For instance, "Handyman America" vs. "Bank of America" = 73/100. So if you lowered it to 70, you would get a false positive, using your example data. Jaro Winkler is generally meant for people's names, not company names, however because company names are typically also very short strings, it may still be useful for you.

我知道您将其标记为MySQL,并且虽然该功能不存在,但据我了解,人们已经完成了为它创建自定义功能的工作: http://androidaddicted.wordpress.com/2010/06/01/jaro-winkler-sql-code/ http://dannykopping.com/blog/fuzzy-text-search-mysql -jaro-winkler

I know you tagged this as MySQL and while this function does not exist, from what I've read people have already done the work creating a custom function for it: http://androidaddicted.wordpress.com/2010/06/01/jaro-winkler-sql-code/ http://dannykopping.com/blog/fuzzy-text-search-mysql-jaro-winkler

您还可以尝试替换字符串,例如.消除找不到匹配项的常见原因(例如,一张桌子上有"Inc.",而另一张桌子上没有).

You could also try string replacements, ex. eliminating common reasons for a match not being found (such as there being an "Inc." on one table but not the other).

编辑2/10/14:

您可以按照以下步骤在MySQL中执行此操作(通过phpmyadmin):

You can do this in MySQL (via phpmyadmin) following these steps:

  1. 进入phpmyadmin,然后进入数据库,并将该URL链接(如下)中的代码粘贴到SQL窗口中,然后单击Go.这将创建您需要在第2步中使用的自定义函数.我不会在此处粘贴该函数的代码,因为它很长,也不是我的工作.它基本上允许您在MySQL中使用jaro winkler算法,就像使用Oracle时使用utl_match一样. http://androidaddicted.wordpress.com/2010/06/01/jaro-winkler-sql-code/

  1. Go into phpmyadmin then your database and paste the code from this URL link (below) into a SQL window and hit Go. This will create the custom function that you'll need to use in Step 2. I'm not going to paste the code for the function here because it's long, also it's not my work. It basically allows you to use the jaro winkler algorithm in MySQL, the same way you would with utl_match if you were using Oracle. http://androidaddicted.wordpress.com/2010/06/01/jaro-winkler-sql-code/

创建该函数后,运行以下SQL:

After that function is created, run the following SQL:

-

select  jaro_winkler_similarity(myusers.companyname, fortunelist.coname) as similarity
        , myusers.uid
        , myusers.name
        , myusers.companyname as user_co
        , fortunelist.coname as matching_co
from    fortunelist
        , myusers
where   jaro_winkler_similarity(myusers.companyname, fortunelist.coname) >= 80

这应该会产生您想要的确切结果,但是就像我说的那样,您将要在该SQL中使用80,然后上下浮动,以便在避免误报与避免误报之间保持良好的平衡.找到您要查找的匹配项.

This should yield the exact result you're looking for, but like I said you'll want to play around with the 80 in that SQL and go up or down so that you have a good balance between avoiding false positives but also finding the matches that you want to find.

我没有要测试的MySQL数据库,因此,如果您遇到问题,请告诉我,但这应该可以工作.

I don't have a MySQL database with which to test so if you run into an issue please let me know, but this should work.

这篇关于从一个表中选择包含另一个表中一长串单词中任何单词的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆