MATLAB树构造 [英] MATLAB Tree Construction

查看:543
本文介绍了MATLAB树构造的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在,我将两个输入文件之间的任何一对分开。找出这对之间的平均值:(第一文本文件中的相关性)X(第二文本文件中的相关性)/(第一文本文件中的相关性)+(第二文本文件中的相关性)。再次将它们存储在一个单独的矩阵中。

Now, I have separate any pair that is in common between the two input files. Find out the mean between that pair like this : (correlation in first text file)X(correlation in second text file)/ (correlation in first text file)+(correlation in second text file). Again store these in a separate matrix.

构建树:
现在,在两个输入文件中的所有元素中,选择10个最常见的元素。这些中的每一个都构成了单独K树的根。算法如下:对于根级别的单词,使用上一步中开发的矩阵中的其他标记检查其所有调和平均值。选择前两个最高谐波均值,并将另一个字放在标记对中作为根的子节点。

Building a tree : Now, out of all the elements in both the input files, select the 10 most frequent ones. Each of these form the root of a separate K tree.The algorithm goes like this : For the word at the root level, check all its harmonic mean values with the other tags in the matrix that is developed in the previous step. Select the top two highest harmonic means, and put the other word in the tag pair as the child node of the root.

有人可以指导我完成MATLAB的步骤吗?谢谢您的时间。

Can someone please guide me through the MATLAB steps of going through this? Thank you for your time.

推荐答案

好的,首先将数据放入有用的格式;可能会计算不同单词的数量,并制作一个N乘M的二进制值矩阵(我称之为 data1 )。 N行中的每一行将描述与单个图像相关联的单词。 M列中的每一列将描述标记单个单词的图像。因此,如果标签M不在图像N中,则(N,M)处的值为0,如果是,则为1。

Okay, so start by putting the data in a useful format; maybe count the number of distinct words, and make an N-by-M matrix of binary values (I'll call this data1). Each of the N rows will describe the words associated with a single image. Each of the M columns will descibe the images for which a single word is tagged. Therefore, the value at (N, M) is 0 if tag M is not in image N, and 1 if it is.

从该矩阵中找出相关性所有单词,你可以做:

From this matrix, to find correlation between all pairs of words, you could do:

correlations1 = zeros(M, M);
for i=1:M
  for j=1:M
    correlations1(i, j) = corr(data1(:, i), data1(:, j));
  end
end

现在矩阵相关性告诉你标签之间的相关性。对其他文本文件执行相同操作。您可以使用以下公式制作谐波均值矩阵:

now the matrix correlations tells you the correlation between tags. Do the same for the other text file. You can make a matrix of harmonic means with:

h_means = correlations1.*correlations2./(correlations1+correlations2);

通过计算数据矩阵每列中1的数量,您可以找到30个最常用的标签。由于我们想在两个文件中找到最常见的标签,我们首先要添加数据基质:

You can find the 30 most freqent tags by counting the number of 1s in each column of the data matrix. Since we want to find the most common tags in both files, we'll add the data matricies first:

[~, tag_ranks] = sort(sum(data1 + data2, 1), 'descending'); %get the indices in sorted order
top_tags = tag_ranks(1:30);

对于最后的树构建,你要么想要创建一个树类(参见 classdef ),或将树存储在数组。要找到最高的两个最高谐波均值,您需要查看h_means矩阵;对于标签m1,我们可以这样做:

For the tree building at the end, you will either want to create a tree class (see classdef), or store the tree in an array. To find the top two highest harmonic means, you will want to look in the h_means matrix; for a tag m1, we can do:

[~, tag_ranks] = sort(h_means(m1, :), 'descending');
top_tag = tag_ranks(1);
second_tag = tag_ranks(2);

然后,您需要将这些标签插入树中并重复。

You will then need to insert these tags into the tree and repeat.

这篇关于MATLAB树构造的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆