如何在Matlab中向量化搜索功能和交叉点? [英] How to vectorize searching function and Intersection in Matlab?

查看:132
本文介绍了如何在Matlab中向量化搜索功能和交叉点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是Matlab编码问题(与setdiff略有不同的版本不相交这里):

Here is a Matlab coding problem (A little different version with setdiff not intersect here):

具有3个列的评分矩阵A,第一个列是可能重复的用户ID,第二个列是可能重复的项目ID,第三个列是用户对项目的评分,范围是1到5.

a rating matrix A with 3 cols, the 1st col is user'ID which maybe duplicated, 2nd col is the item'ID which maybe duplicated, 3rd col is rating from user to item, ranging from 1 to 5.

现在,我有一个用户ID的子集 smallUserIDList 和一个项目ID的子集 smallItemIDList ,然后我想在 smallUserIDList 中找到用户评分的A行,收集用户评分的项目,并进行一些计算,例如与 smallItemIDList 并计算结果,如以下代码所示:

Now, I have a subset of user IDs smallUserIDList and a subset of item IDs smallItemIDList, then I want to find the rows in A that rated by users in smallUserIDList, and collect the items that user rated, and do some calculations, such as intersect with smallItemIDList and count the result, as the following code does:

userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
    A2= A(A(:,1) == smallUserIDList(i), :);
    itemIDList_each = unique(A2(:,2));

    setIntersect = intersect(itemIDList_each , smallItemIDList);
    userStat(i) = length(setIntersect);
end
userStat

最后,我发现配置文件查看器显示了上面的循环效率低下,问题是如何通过向量化来改进这段代码,但是要借助for循环?

Finally, I find the profile viewer showing that the loop above is inefficient, the question is how to improve this piece of code with vectorization but the help of for loop?

例如:

输入:

A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5    
]

smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]

输出:

userStat =

 1
 1
 2
 3

推荐答案

啊!您需要对上一个问题的接受解决方案进行少量编辑.这是解决方案-

Ah! You need a tiny edit in the accepted solution to the previous question. Here's the solution -

[R,C] = find(bsxfun(@eq,A(:,1),smallUserIDList(:).')); %//'
mask = ismember(A(R,2),smallItemIDList(:).'); %//'# The edit was needed here

ARm = A(R,2);
Cm = C(mask);
ARm = ARm(mask);

userStat = zeros(numel(smallUserIDList),1);
if ~isempty(Cm)
    dup_counts = accumarray(Cm,ARm,[],@(x) numel(x)-numel(unique(x)));
    accums = accumarray(C,mask);
    userStat(1:numel(accums)) = accums;
    userStat(1:numel(dup_counts)) = userStat(1:numel(dup_counts)) - dup_counts;
end


作为奖励,您可以编辑预分配步骤-


As a bonus stuff, you can edit the pre-allocation step -

userStat = zeros(numel(smallUserIDList),1);

采用这种更快的预分配方案-

with this much faster pre-allocation scheme -

userStat(1,numel(smallUserIDList)) = 0;

在此 MATLAB Undocumented post on Pre-allocation 中了解有关此内容的更多信息.

Read more about it in this MATLAB Undocumented post on Pre-allocation.

这篇关于如何在Matlab中向量化搜索功能和交叉点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆