Matlab.将平均值替换为平均值 [英] Matlab. Replace missed values with an avg

查看:320
本文介绍了Matlab.将平均值替换为平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题

如何用字符串(最常出现的类)和数字列的平均值替换漏掉的列值?

How to replace missed columns values with an average value for string (the most occurring class) and number columns?

数据示例来自:

UCI ML存储库.虹膜

例如,将NaN替换为'Iris-setosa'

我有验证码

它仅替换值,但也替换字符串.

It replaces only values, but how to replace strings too.

function dataWithReplaced = replaceNaNWithAvg(data)

dataWithReplaced = [ ];

averagePerCol = table2array(varfun(@nanmean, data(: , 1:4)));

for i = 1:4

    dataColumn = table2array(data( : , i));
    dataColumn(isnan(dataColumn)) = averagePerCol(1, i);

    dataWithReplaced = [dataWithReplaced dataColumn];

end

end

我是MATlab的新手,所以很多事情对我来说并不明显.

I'm a new in MATlab, so many things are not obvious for me.

推荐答案

以下解决方案解决了该问题:

The following solution solves the problem:

  • Convert last table column to cell array (cell array is required for holding strings with different lengths).
  • Remove all NaN elements from cell array (NaN elements disrupt the next section).
  • Find most repeated string in cell array.
  • Find all indeces of NaN elements in stringColumn (I used cellfun based on previous section).
  • Rplace elements in indeces found, with most common string.

由于您是Matlab的新手,所以我的解决方案对您来说将变得极其复杂(对我而言,这看起来很复杂).
可能有一个更简单的解决方案,我找不到...

Since you are new to Matlab, my solution is going to look extremely complicated for you (it looks complicated for me).
There might be a simpler solution, that I could't find...

请参见以下代码示例:

%Create data table for the example.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
VarName1 = [4.9; 7.3; 6.7; 7.2; 6.5; 6.4; 6.8; 5.7; 5.8; 6.4; 6.5];
VarName2 = [2.5; 2.9; 2.5; 3.6; 3.2; 2.7; 3.0; 2.5; 2.8; 3.2; 3.0];
VarName3 = [4.5; 6.3; 5.8; 6.1; 5.1; 5.3; 5.5; 5.0; 5.1; 5.3; 5.5];
VarName4 = [1.7; 1.8; 1.8; 2.5; 2.0; 1.9; 2.1; 2.0; 2.4; 2.3; 1.8];
VarName5 = {NaN; 'aa'; 'aa'; 'bbb'; NaN; 'ccc'; 'ccc'; 'ccc'; 'ccc'; 'dddd'; 'dddd'};
data = table(VarName1, VarName2, VarName3, VarName4, VarName5);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%Convert last table column to cell array.
stringColumn = table2cell(data(:, 5));

%Remove all NaN elements from cell array
%Reference: https://www.mathworks.com/matlabcentral/newsreader/view_thread/314852
x = stringColumn(cell2mat(cellfun(@ischar,stringColumn,'UniformOutput',0)));

%Find most repeated string in cell array:
%Reference: https://www.mathworks.com/matlabcentral/answers/7973-how-to-find-out-which-item-is-mode-of-cell-array
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
y = unique(x);
n = zeros(length(y), 1);
for iy = 1:length(y)
    n(iy) = length(find(strcmp(y{iy}, x)));
end
[~, itemp] = max(n);
commonStr = y(itemp);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%Find all indeces of NaN elements in stringColumn.
nanIdx = find(cell2mat(cellfun(@ischar,stringColumn,'UniformOutput',0)) == 0);

%Rplace elements with NaN values with commonStr.
stringColumn(nanIdx) = commonStr;

%Replace last column of original table
data(:, 5) = stringColumn;

这篇关于Matlab.将平均值替换为平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆