从 n 列中选择最高匹配结果 [英] Select highest matching results from n columns

查看:52
本文介绍了从 n 列中选择最高匹配结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

按更高百分比匹配检查 30 列 mysql 的顺序

Order by higher percentage matching checking 30 columns mysql

我想做一个配置文件匹配项目.目标是首先返回匹配更好百分比的 100 个结果.风景是-

I would like to make a profile matching project. The target is to return say 100 results matching better percentage first. the scenery is -

A user has yes or no answer of 30 questions(all answered).
User is interested to see 100 people who has matching with him order by higher percentage

我需要建议来决定如何制作表和查询以确保最小的处理负载 -

I need suggestion to decide how I will make the table and query ensuring minimum processing load -

我应该将答案存储在单独的列中(每列中的值为是/否)还是在用逗号分隔的同一列中(只有是的答案受过教育、高大、富有、单身、关心)?

Should I store answers in separate columns (value is yes/no in each column) or in same column separated by comma (only yes answers educated,tall,rich,single,caring)?

表 A 和表 B 应按百分比返回最高匹配顺序的查询是什么.

What should be the query for Table A and Table B to return highest matching order by percentage.

这是表格(30个固定问题的答案,是/否类型答案)

Here is the Table (answers for 30 fixed questions, yes/no type answer)

.id | name | q01 | q02 | q03 | q04 | q05 | q06 |...continue...| q30

11 .|. tom ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1

12 .|. mik ..|.. 0 ..|.. 0 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|..... ............. |. 0

13 .|. jim ...|.. 1 ..|.. 1 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1

14 .|. don ..|.. 0 ..|.. 1 ..|.. 1 ...|.. 0 ..|.. 0 ..|.. 0 ..|..... ............. |. 1

15 .|. ric ....|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 0

16 .|. jam ..|.. 0 ..|.. 1 ..|.. 0 ...|.. 0 ..|.. 0 ..|.. 0 ..|..... ............. |. 1

17 .|. joe ...|.. 1 ..|.. 1 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|..... ............. |. 1

18 .|. ima ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 1

19 .|. sun ..|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|..... ............. |. 0

20 .|. dim ..|.. 0 ..|.. 0 ..|.. 1 ...|.. 1 ..|.. 0 ..|.. 0 ..|.... .............. |. 0

21 .|. dic ...|.. 1 ..|.. 0 ..|.. 0 ...|.. 1 ..|.. 0 ..|.. 1 ..|.... .............. |. 1

xx .|. yyy ...|.. up to fifty thousand rows.. ...... |....................|. 

xuser (example:id 15) 想要得到 100 个按与他最佳匹配排序的结果(q01 到 q30 列匹配).匹配比例最高的应该首先返回.

x user (example:id 15) would like to get 100 result ordered by best match with him (q01 to q30 columns to match). Highest percentage of match should return first.

请帮我查询

SELECT * FROM table WHERE 条件 ORDER BY 匹配条件 LIMIT0,100

SELECT * FROM table WHERE condition ORDER BY matching condition LIMIT 0,100

我需要什么条件?

推荐答案

  • 完美匹配:
  • 在这种情况下,您应该创建每个答案列,手动创建此位图(每个问题 1 位).在此列上创建索引.

    In this case, you should create your per answer column where you create this bitmap manually (1 bit for each question). Create an index on this column.

    表格应如下所示:

    user_id  q1   q2 ... qn  accumulator (>n bits)
    1          red  no     yes 100110101 
    

    • 近似匹配:
    • 如果使用位图索引,则必须搜索键的所有 x 位变体.其中 x/Number_of_questions * 100 是最小百分比.

      If using a bitmap index, you have to search all x bit variations of the key. Where x / Number_of_questions * 100 is the minimum percentage.

      EX:1 位可变密钥:从 101 开始,您将拥有 001、111、100.

      EX: 1 bit varying keys: From 101 you would have 001, 111, 100.

      如果不同的问题具有不同的权重,则您无法在应用程序级别将其考虑在内.

      If different questions have different weights, you cant factor this in at the application level.

      如果问题编号不是静态的(以防您以后可能想要添加或删除问题),我建议您标准化您的答案表.这取决于存储引擎(对于 MongoDB 应该不是问题).

      I would suggest you normalize your answer table in case question number is not static(in case you might want to add or remove questions later). This depends on the storage engine (shouldn't be a problem for MongoDB).

      同样,使用累加器,表格应如下所示:

      Again, using the accumulator, table should look like:

      user_id answer_id  accumulator (>n bits)
      1       1          100110101 
      

      现在当您搜索时,您将对结果进行异或并以此排序.

      Now when you search, you will XOR your result and sort by this.

      SELECT * FROM answers ORDER BY BIT_COUNT(myAnswer ^ accumulator) ASC;
      

      这篇关于从 n 列中选择最高匹配结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆