如何根据表格扫描每个单词的值,然后计算出来,然后从中制作VSM(向量空间模型) [英] How to scan values from each words based on tables and then calculate it And Make The VSM (Vector Space Model) From It
问题描述
说我有一个表,其中包含另一个表中每个单词的概率.该表有2个类别. actual 和 non_actual .我将其命名为 master_table
Say that I have a table that contains probabilities from each words from another table. This Table has 2 classes; actual and non_actual. I will name it master_table
actual = [0.5;0.4;0.6;0.75;0.23;0.96;0.532]; %sum of the probabilities is 1.
actual + non_actual = 1
non_actual = [0.5;0.6:0.4;0.25;0.77;0.04;0.468];
words = {'finn';'jake';'iceking';'marceline';'shelby';'bmo';'naptr'};
master_table = table(actual,non_actual,...
'RowNames',words)
然后我有一个包含句子的表格.我将其命名为T2
And then I have a table that contains sentences. I will name it T2
sentence = {'finn marceline naptr';'jake finn simon marceline haha';'jake finn finn jake iceking';'bmo shelby shelby finn naptr';'naptr naptr jake finn bmo shelby'}
T2 = table('RowNames',sentence)
如何制作(不属于master_table的单词,例如"simon","haha"具有值1,因此不会影响确定该类的概率的计算):
How to make like this (Words that dont belong in the master_table like "simon", "haha" have value 1, so it wont affects the calculation of the probabilities to determine the class) :
actual %determines the value based on probabilities from each words% non_actual class
finn marceline naptr 0.5 * 0.75 * 0.532 0.5 * 0.25 * 0.468 compares the value from each class. if actual > non_actual then the class should be "actual"
jake finn simon marceline haha 0.4 * 0.5 * 1 * 0.25 * 1 0.6 * 0.5 * 1 * 0.75 * 1
jake finn finn jake iceking
bmo shelby shelby finn naptr
naptr naptr jake finn bmo shelby
以及如何根据上述问题制作VSM(向量空间模型):
And how to make the VSM (vector space model) from the problem above:
WORDS
| bmo | finn | jake | iceking | haha | marceline | naptr | shelby | simon | %words sorted alphabetically
finn marceline naptr 0 1 0 0 0 1 1 0 0
jake finn simon marceline haha 0 1 1 0 1 1 0 0 1
jake finn finn jake iceking 0 2 2 1 0 0 0 0 0
bmo shelby shelby finn naptr 1 1 0 0 0 0 1 1 0
naptr naptr jake finn bmo shelby 1 1 1 0 0 0 1 1 0
推荐答案
这也有点循环,但是我觉得性能不是问题.我会先创建一个更大的表,然后在循环中更改值:
This is a bit loopy as well but i fell like performance is not an issue. I would create a bigger table first and then change the values in a loop:
T2 = table(ones(height(T2),1),ones(height(T2),1),repmat({''},height(T2),1),'RowNames',sentence,'VariableNames',{'actual' 'non_actual' 'outcome'});
for i=1:height(T2)
% split the row name
A=strsplit([T2.Properties.RowNames{i,:}]);
actual=1; %which is neutral for multiplication
non_actual=1;
for j=1:length(A)
actual = actual * master_table{A(j),1};
non_actual = non_actual * master_table{A(j),2};
end
%if you need those
T2.actual(i)=actual;
T2.non_actual(i)=non_actual;
if actual > non_actual
T2.outcome(i)={'actual'};
else
T2.outcome(i)={'non_actual'};
end;
end;
这篇关于如何根据表格扫描每个单词的值,然后计算出来,然后从中制作VSM(向量空间模型)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!