如何在Matlab中向量化搜索功能? [英] How to vectorize searching function in Matlab?

查看:144
本文介绍了如何在Matlab中向量化搜索功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是Matlab编码问题(相交的版本略有不同,没有setdiff

Here is a Matlab coding problem (A little different version with intersect not setdiff here:

具有3个列的评分矩阵A,第一个列是可能重复的用户ID,第二个列是可能重复的项目ID,第三个列是用户对项目的评分,范围是1到5.

a rating matrix A with 3 cols, the 1st col is user'ID which maybe duplicated, 2nd col is the item'ID which maybe duplicated, 3rd col is rating from user to item, ranging from 1 to 5.

现在,我有一个用户ID的子集 smallUserIDList 和一个项目ID的子集 smallItemIDList ,然后我想在 smallUserIDList 中找到用户评分的A行,收集用户评分的项目,并进行一些计算,例如使用 smallItemIDList 并计算结果,如以下代码所示:

Now, I have a subset of user IDs smallUserIDList and a subset of item IDs smallItemIDList, then I want to find the rows in A that rated by users in smallUserIDList, and collect the items that user rated, and do some calculations, such as setdiff with smallItemIDList and count the result, as the following code does:

userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
    A2= A(A(:,1) == smallUserIDList(i), :);
    itemIDList_each = unique(A2(:,2));

    setDiff = setdiff(itemIDList_each , smallItemIDList);
    userStat(i) = length(setDiff);
end
userStat

最后,我发现配置文件查看器显示了上面的循环效率低下,问题是如何通过向量化来改进这段代码,但是要借助for循环?

Finally, I find the profile viewer showing that the loop above is inefficient, the question is how to improve this piece of code with vectorization but the help of for loop?

例如:

输入:

A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5    
]

smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]

输出:

userStat =

 0
 1
 0
 2

推荐答案

Vanilla MATLAB:

据我所知,您的代码等同于:

Vanilla MATLAB:

As far as I can tell your code is equivalent to:

%// Create matrix such that: user_item_rating(user,item)==rating
user_item_rating = sparse(A(:,1),A(:,2),A(:,3));

%// Keep all BUT the items in smallItemIDList
user_item_rating(:,smallItemIDList) = [];

%// Keep only those users in `smallUserIDList` and use order of this list
user_item_rating = user_item_rating(smallUserIDList,:);

%// Count the number of ratings
userStat = sum(user_item_rating~=0, 2);

如果每个(user,item)组合最多具有一个等级,则此功能将起作用.而且它应该非常有效.

This will work if there is at most one rating per (user,item)-combination. Also it should be quite efficient.

从统计工具箱中查看 grpstats ! 一个实现可能类似于以下内容:

Check out grpstats from the Statistics Toolbox! An implementation could look similar to this:

%// Create ratings table
ratings = array2table(A, 'VariableNames', {'user','item','rating'});

%// Remove items we don't care about (smallItemIDList)
ratings = ratings(~ismember(ratings.item, smallItemIDList),:);

%// Keep only users we care about (smallUserIDList) 
ratings = ratings(ismember(ratings.user, smallUserIDList),:);

%// Compute the statistics grouped by 'user'. 
userStat = grpstats(ratings, 'user');

这篇关于如何在Matlab中向量化搜索功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆