在MySQL表中获取最相似的行并按相似性对其进行排序 [英] Getting most similar rows in MySQL table and order them by similarity

查看:173
本文介绍了在MySQL表中获取最相似的行并按相似性对其进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据库表,其中包含用户的车辆(汽车,摩托车).我想从那张桌子中拿出最相似的车辆.可以说该表包含以下几列(并带有一些上下文可以理解这一点):

I have a database table that holds user's vehicles (cars, motorcycles). I want to get the most similar vehicles out of that table. Lets say the table holds the following columns (with some context to get the idea):

table: vehicles


vehicle_id (pk, auto-increment)
model_id (BMW 3er, Honda Accord)
fuel_type (gasoline, diesel)
body_style (sedan, coupe)
year
engine_size (2.0L)
engine_power (150hp)

因此,简而言之,我想选择N个(通常是3个)具有相同make_id(至少)的行,并按它们与种子载体共享的相似度对它们进行排名,可以说如果fuel_type匹配,我就是d的得分为+3,但如果body_style匹配,则为+1.理想情况下,我将获得N点具有最高分数的车辆,但想法是当我没有时仍能得到一些东西.

So in short I want to select N (usually 3) rows that have the same make_id (at least) and rank them by the amount of similarities they share with the seed vehicle lets say if the fuel_type matches, I'd have rank points +3, but if the body_style matches, it would be +1. Ideally I would get N vehicles that have maximum points but the idea is to still get something when I don't.

推荐答案

由于在我的表中目前只有大约5k行,并且它们正在缓慢增长,所以我决定实际使用以下简单方法(在我遇到之后我写了问题).

As in my table currently I have only around 5k rows and they are slowly growing, I decided to actually use the following simple approach (it came to me just after I wrote the question).

种子可以说是本田雅阁(model_id 456),2004,汽油,2.0L,155hp,轿车,带自动档ID 123.

The seed lets say is Honda Accord (model_id 456), 2004, gasoline, 2.0L, 155hp, sedan with auto-inc ID 123.

SELECT vehicles.*,  
    (IF(`fuel_type`='gasoline', 3, 0) + 
     IF(`body_style`='sedan', 1, 0) + 
     IF(`year` > 2001 AND `year` < 2007, 2, 0) + 
     IF(`engine_size` >= 1.8 AND `engine_size` <= 2.2, 1, 0) + 
     IF(`engine_power`=155, 3, IF(`engine_power`>124 AND `engine_power`<186, 1, 0))) AS `rank`
FROM vehicles
WHERE vehicle_id!=123 AND model_id=456
ORDER BY `rank` DESC
LIMIT 3

只要我没有太多的行,它将起作用.如果表格变成50-100k,我可能不得不切换到Lucene?

It will work, as long as I don't too many rows. If the table becomes 50-100k, I probably will have to switch to something like Lucene?

这篇关于在MySQL表中获取最相似的行并按相似性对其进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆