在mysql中搜索名称的最佳方法 [英] Best way to search for names in mysql

查看:75
本文介绍了在mysql中搜索名称的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想搜索一个名为"Martins Silva"的用户.我在BOOLEAN MODE中使用全文.

I want to search for a user named "Martins Silva". I'm using fulltext in BOOLEAN MODE.

 MATCH(name,lastname) AGAINST('+martins +silva' IN BOOLEAN MODE)

"Martins Silva"的搜索结果返回

The search results for "Martins Silva" returns

Orleans Silva De Martins (1)
Armistrong Oliveira Martins Da Silva (2)
Douglas Martins Vieira Da Silva (3)
Glauciene Silva Martins (4)
Jose Martins Silva (5)
...

问题在于数据库上有一个名为"Martins Silva"的用户,但它只出现在位置540的此结果集上.

The problem is that there is a user named "Martins Silva" on the database but it only appears on this result set on the position 540.

这是我从阅读文档并查看排名如何得出的结果.但是,这无助于我解决此问题.我也尝试用LIKE搜索,但我得到了相同的结果.

This is the result I should expect from reading the documentation and seeing how the rank is calculated. However, it does not help me to solve this problem. I also tried searching with LIKE but I get the same.

鉴于该结果集,对我来说最好的是:

Given that result set, the best for me would be:

Martins silva (540) -> because it is the exact phrase 
Jose Martins Silva (5) -> because it is the exact phrase that appears in a position first than in (2)
Armistrong Oliveira Martins Da Silva (2) -> distance between martins and silva is shorter than in (3)
Douglas Martins Vieira Da Silva (3)
Glauciene Silva Martins (4) -> lower priority when it is out of order
Orleans Silva De Martins (1)   

因此,我认为我可以通过考虑单词在查询中的顺序或位置的算法来解决此问题.

So, I think I could solve this problem with an algorithm that considers the order or the position that the words are in the query.

我尝试计算了levenshtein距离,但是对于大型数据库而言,这确实很慢.

I tried calculating the levenshtein distance, but it is really slow for a large database.

MySQL中是否有解决此问题的方法?还是我必须使用某些东西作为Apache Lucene?还是我做错了什么?这项搜索是我网站上的主要内容,必须运行良好.

Is there a way in MySQL to solve this? Or I would have to use something as Apache Lucene? Or What am I doing wrong? This search is the main thing on my website and it has to work really well.

非常感谢你们,伙计们!

Thank you so much, guys!

推荐答案

在您的特定情况下,您将需要实现一个levenshtein函数才能完成此任务.比赛将根本无法正确地做到这一点.通过对levenshtein相关性ASC进行排序,您将从最相关到​​最不相关.

in your particular case, you will need to implement a levenshtein function in order to accomplish this. match will simply not be able to do it the right way. By sorting levenshtein relevancy ASC you will have from the most relevant to the least.

levenshtein函数可添加到您的数据库中:

levenshtein function to add to your database:

DELIMITER $$
CREATE FUNCTION levenshtein( s1 VARCHAR(255), s2 VARCHAR(255) )
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
DECLARE s1_char CHAR;
-- max strlen=255
DECLARE cv0, cv1 VARBINARY(256);
SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0;
IF s1 = s2 THEN
RETURN 0;
ELSEIF s1_len = 0 THEN
RETURN s2_len;
ELSEIF s2_len = 0 THEN
RETURN s1_len;
ELSE
WHILE j <= s2_len DO
SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1;
END WHILE;
WHILE i <= s1_len DO
SET s1_char = SUBSTRING(s1, i, 1), c = i, cv0 = UNHEX(HEX(i)), j = 1;
WHILE j <= s2_len DO
SET c = c + 1;
IF s1_char = SUBSTRING(s2, j, 1) THEN
SET cost = 0; ELSE SET cost = 1;
END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost;
IF c > c_temp THEN SET c = c_temp; END IF;
SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1;
IF c > c_temp THEN
SET c = c_temp;
END IF;
SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1;
END WHILE;
SET cv1 = cv0, i = i + 1;
END WHILE;
END IF;
RETURN c;
END$$
DELIMITER ;

此模式查询将首先按最相关的姓氏进行排序,然后将其命名为第二行,并将最相关的列显示为第一行,添加LIMIT 1仅获得最相关的结果:

This modal query will sort by most relevant lastname first and name second and show the most relevant as first row, add LIMIT 1 to only get the most relevant result:

Select lastname, levenshtein(lastname,$var1) as relevance1,
name, levenshtein(name,$var2) as relevance2
FROM database
ORDER BY relevance 1 ASC, relevance 2 ASC

这篇关于在mysql中搜索名称的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆