Matlab,直方图中的每个条形对应于哪个字母 [英] matlab, each bar in histogram correspond to which letter
问题描述
我有400个文件,每个文件包含大约500000个字符,而这500000个字符仅由大约20个字母组成.我想制作一个直方图,指示使用的最多10个字母(x轴)和每个字母的使用次数(y轴).我写了这段代码,其中缺少一些内容,我想知道每个小节对应于哪个字母.我应该在代码上添加什么?您可以更改整个代码,但是保留此代码对我来说更好.提供完整的代码,这样我就可以直接将其复制到脚本中并运行它.
I have 400 files, each one contains about 500000 character, and those 500000 characters consists only from about 20 letters. I want to make a histogram indicating the most 10 letters used (x-axis) and number of times each letter is used (y-axis). I wrote this code which has missing thing which is I want to know each bar is corresponding to which letter. What should I add on the code ? You can change the whole code, but keeping this is better for me. provide me the whole code so I can copy it directly to a script and run it.
i = 1;
z = zeros(1, 10);
for i=1:400
j = num2str(i);
file_name = strcat('part',j,'txt');
file_id = fopen(file_name);
part = fread(file_id, inf, 'uchar');
h = hist(part,10);
z = z + h;
fclose(file_id);
end
推荐答案
首先,您对hist
的使用是错误的. hist(data,10)
将从包含10个bin的数据中创建直方图,因此一个bin将对应于文件中的多个字符.
First of all, your use of hist
is wrong. hist(data,10)
will create a histogram from data that consists of 10 bins, so a bin will correspond to more than one character in your files.
解决此问题的一种方法是在预定义的bin上使用hist
,例如:
A way to solve this would be to use hist
on predefined bins like:
bins = 1:255; % define the bins for hist
histSum = zeros(numel(bins),1);
for file=1:10;
data = randi(25,100) + 'a'; %Generate random data - letters between 'a' and 'z'
data = reshape(T,numel(T),1); % Make it a vector
histSum = histSum + hist(data,bins)';
end
请注意,您必须定义垃圾箱以容纳所有个可能的值,因此范围是1到255
Note that you have to define your bins to accommodate all possible values, therefore ranging from 1 to 255
这篇关于Matlab,直方图中的每个条形对应于哪个字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!