寻找最相关的项目 [英] Finding the most correlated item
本文介绍了寻找最相关的项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有如下餐厅销售明细.
I have a restaurant sales details as below.
+----------+------------+---------+----------+
| Location | Units Sold | Revenue | Footfall |
+----------+------------+---------+----------+
| Loc - 01 | 100 | 1,150 | 85 |
+----------+------------+---------+----------+
我想从下表的餐厅数据中找到与以上餐厅相关性最高的餐厅
I want to find the most correlated restaurant to the above from the below tables restaurant data
+----------+------------+---------+----------+
| Location | Units Sold | Revenue | Footfall |
+----------+------------+---------+----------+
| Loc - 02 | 100 | 1,250 | 60 |
| Loc - 03 | 90 | 990 | 90 |
| Loc - 04 | 120 | 1,200 | 98 |
| Loc - 05 | 115 | 1,035 | 87 |
| Loc - 06 | 89 | 1,157 | 74 |
| Loc - 07 | 110 | 1,265 | 80 |
+----------+------------+---------+----------+
请指导我如何使用python或pandas完成此操作.
注意:-关联性是指根据Units Sold
,Revenue
& Footfall
.
please guide me how this can be done with python or pandas..
Note : - correlation means most matching/similar restaurant in terms of Units Sold
, Revenue
& Footfall
.
推荐答案
如果应将您的相关性描述为最小欧氏距离,则解决方案是:
If your correlation should be described like minimal euclidean distance, solution is:
#convert columns to numeric
df1['Revenue'] = df1['Revenue'].str.replace(',','').astype(int)
df2['Revenue'] = df2['Revenue'].str.replace(',','').astype(int)
#distance of all columns subtracted by first row of first DataFrame
dist = np.sqrt((df2['Units Sold']-df1.loc[0, 'Units Sold'])**2 +
(df2['Revenue']- df1.loc[0, 'Revenue'])**2 +
(df2['Footfall']- df1.loc[0, 'Footfall'])**2)
print (dist)
0 103.077641
1 160.390149
2 55.398556
3 115.991379
4 17.058722
5 115.542200
dtype: float64
#get index of minimal value and select row of second df
print (df2.loc[[dist.idxmin()]])
Location Units Sold Revenue Footfall
4 Loc - 06 89 1157 74
这篇关于寻找最相关的项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文