在Matlab中计算字符串在单元格数组中出现的次数 [英] Counting the occurrences of a string in a cell array of strings in Matlab

查看:739
本文介绍了在Matlab中计算字符串在单元格数组中出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个不同长度的字符串单元格数组,分别为d = {'nerve','body','muscle','bone'}和e = {'body','body','muscle'}.我必须比较这两个数组,并计算d中e中每个字符串的出现.预期结果应该是一个向量,count_string =(0,2,1,0). 以下是我编写的代码,但出现错误:将单元格内容分配给非单元格数组对象.我是Matlab编程的初学者.任何对此的快速帮助,将不胜感激.

I have two cell arrays of strings of varying lengths, d={'nerve','body','muscle','bone'} and e={'body','body','muscle'}. I have to compare these two arrays and count the occurrences of each string in e in d. The expected result should be a vector, count_string=(0,2,1,0). The following is the code I've written but I get the error:Cell contents assignment to a non-cell array object. I am a beginner in matlab programming. Any quick help on this is greatly appreciated.

count_string=size(d)
for i=1:length(d)    
count_string{i}=sum(ismember(e{i},d));
end

在您提出以下所有建议之后,这就是我所拥有的模块.

After all your below suggestions, this is the module i have.

for i=1:length(d_union)
count_string1=cellfun(@(x) sum(ismember(d1,x)), d_union);
count_string2=cellfun(@(x) sum(ismember(d2,x)), d_union);
count_string3=cellfun(@(x) sum(ismember(d3,x)), d_union);
count_string4=cellfun(@(x) sum(ismember(d4,x)), d_union);
count_string5=cellfun(@(x) sum(ismember(d5,x)), d_union);
count_string6=cellfun(@(x) sum(ismember(d6,x)), d_union);
count_string7=cellfun(@(x) sum(ismember(d7,x)), d_union);
count_string8=cellfun(@(x) sum(ismember(d8,x)), d_union);
count_string9=cellfun(@(x) sum(ismember(d9,x)), d_union);
count_string10=cellfun(@(x) sum(ismember(d10,x)), d_union);
count_string11=cellfun(@(x) sum(ismember(d11,x)), d_union);
count_string12=cellfun(@(x) sum(ismember(d12,x)), d_union);
count_string13=cellfun(@(x) sum(ismember(d13,x)), d_union);
count_string14=cellfun(@(x) sum(ismember(testdoc,x)), d_union);    end   

我的matlab编译器花了很多时间来执行此模块. 'd_union'是一个1x1216的单元格阵列,每个d1到testdoc大约都是1x240的单元格阵列.我要计算从上述操作中得到的向量的余弦相似度.有没有办法加快这个过程?请帮忙.谢谢你.

My matlab compiler is taking forever to execute this module. 'd_union' is a 1x1216 cell array and each of the d1 to testdoc is approximately 1x240 cell array. I gotta calculate the cosine similarity of the vectors I get from the above operation. Is there a way to speed up the process? Please help. Thank you.

推荐答案

您可以像这样在e中计算d中来自d的字符串出现次数:

You can count occurrences of strings from d in e like this:

count_string = cellfun(@(x) sum(ismember(e,x)), d);

对于您的样本数据,您将获得向量[0 2 1 0];

For your sample data you will get vector [0 2 1 0];

d数组仅包含唯一字符串吗?

Does the d array contain only unique strings?

更新:

这是另一种使用 GRP2IDX 将字符串临时转换为数字的方法.并使用 HISTC 进行计数.假定e中的所有字符串也存在于d中.

Here is another method with temporary converting strings to numbers with GRP2IDX and counting them with HISTC. It assumes all strings in e are also exist in d.

[gi g] = grp2idx([d e]);
gn = histc(gi(numel(d)+1:end),1:numel(g));

g将包含唯一的字符串(可能与d相同),而gn将是计数. gi是用于计数的临时数值数组.

g will contain the unique strings (probably will be identical to d) and gn will be the counts. gi is temporary numerical array used for counting.

您需要统计工具箱才能访问GRP2IDX功能.

这篇关于在Matlab中计算字符串在单元格数组中出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆