MATLAB:表的行数相等或字符串的字词相等 [英] MATLAB: Equal rows of table OR Equal words of strings

查看:56
本文介绍了MATLAB:表的行数相等或字符串的字词相等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用不同的字符串制作不同的表.字符串具有不同的长度,因此表将具有不同的行数.我想合并这些表(最后),因此需要我拥有的表具有相同数量的行.我的计划是使用NaN来执行此操作,但仍未成功.

I would like to make different tables from different strings. The strings have different lengths, and thus the tables will have different amount of rows. I would like to combine these tables(at the end), and therefore need the tables I have, to have the same amount of rows. My plan is to use NaNs to do this, but yet without success.

我在这里尝试我的代码,并在其中苦苦挣扎,标记为问题位置".代码:

I have my code attempt here, with where I'm struggling at, marked as "Problem location." Code:

 String = ["Random info in middle one, "+ ...
           "Random info still continues. ",
           "Random info in middle two. "+ ...
           "Random info still continues. ExtraWord1 ExtraWord2 ExtraWord3 "];  % String 2 has one word more than string one
    
%%%%%% FOCUS AREA BEGINS %%%%%%%%
    for x=1:length(String)
        % Plan to add NaNs
        documents_Overall = tokenizedDocument(String(x,1));
        tdetails = tokenDetails(documents_Overall);
        StringTable = tdetails(:,{'Token','Type'});
        StringHeight(x) = height(StringTable);
    
    MaxHeight=max(StringHeight);
    StringTable(end+1:MaxHeight,1)=NaN; % Problem location.
    
    %Plan to Convert table back to string
    DataCell = table2cell(StringTable);
    String(x,1) = [DataCell{:}];
end

%%%%%% FOCUS AREA ENDS %%%%%%%%


%Plan to combine tables

    documents_Middle = tokenizedDocument(String);
    tdetails = tokenDetails(documents_Middle);
        
    t = table();d = tokenizedDocument(String);
    variableNames = [];variables = [];
    
    for n=1:length(d)
     variableNames = [variableNames {sprintf('Tokens for sentence %d',n)} {sprintf('Type for sentence %d',n)}];
     variables = [variables {d(n).tokenDetails.Token} {d(n).tokenDetails.Type}];
    end
    
    %Table = cell2table(variables);
    table(variables{:},'VariableNames',variableNames)

对于任何数量的字符串,此延续旨在使行数等于行数,而所有其他字符串都需要填充以匹配最长的字符串.我的计划是使用NaN来实现此目标,但仍未成功.这是本示例的结果应如下所示:

This continuation is aimed at equaling the amount of rows to an equal the amount of rows, for any amount of strings, with all the other strings needing to fill up to match the longest string. My plan is to use NaNs to achieve this goal, but yet without success. This is what the result of this example should look like:

所有帮助均不胜枚举.谢谢

All help apreciated. Thank you

推荐答案

我基于对您的上一个问题的回答.

下面的逻辑是,我们首先找到最大列的大小(在此示例中为14);然后,我们找到需要填充的列的索引(我们知道这些列是成对出现的,因此在执行此操作时,我们只能考虑其他所有列);最后,我们遍历需要填充的列,用< missing> (等效于字符串的NaN)填充该列,并用 letters 填充后面的列.

The logic below is that we first find the size of the largest column (in this example, 14); then, we find the indexes of the columns that need padding (we know that the columns go in pairs, so we can consider only every other column when doing this); finally, we iterate over the columns that need padding, padding said column with <missing> (NaN equivalent for string) and padding the following one with letters.

s = ["Random info in middle one, "+ ...
           "Random info still continues. ",
           "Random info in middle two. "+ ...
           "Random info still continues. ExtraWord ExtraWord ExtraWord "];

t = table();
d = tokenizedDocument(s);

variableNames = [];
variables = [];
max_column_size = 1;

for n=1:length(d)
 variableNames = [variableNames {sprintf('Tokens for sentence %d',n)} {sprintf('Type for sentence %d',n)}];
 variables = [variables {d(n).tokenDetails.Token} {d(n).tokenDetails.Type}];
 column_size = size(d(n).tokenDetails.Token,1);
 if column_size > max_column_size
    max_column_size = column_size;
 end
end

% Setup anonymous function to determine size of column
f = @(x) size(x,1) < max_column_size;

% Loop over variables to determine which columns need to be padded
indeces_to_pad = find(cell2mat(cellfun(f,variables,'UniformOutput',false)));
indeces_to_pad(2:2:end) = [];

% Loop over the columns to be padded and pad them
for n=1:length(indeces_to_pad)
    index_to_pad = indeces_to_pad(n);
    column_size_diff = max_column_size - length(variables{index_to_pad});
    variables{index_to_pad} = [variables{index_to_pad}; NaN((column_size_diff), 1)];
    variables{index_to_pad+1} = [variables{index_to_pad+1}; categorical(repmat("letters",(column_size_diff), 1))];
end


table(variables{:},'VariableNames',variableNames)

将产生下表:

ans =

  14×4 table

    Tokens for sentence 1    Type for sentence 1    Tokens for sentence 2    Type for sentence 2
    _____________________    ___________________    _____________________    ___________________

         "Random"                letters                 "Random"                letters        
         "info"                  letters                 "info"                  letters        
         "in"                    letters                 "in"                    letters        
         "middle"                letters                 "middle"                letters        
         "one"                   letters                 "two"                   letters        
         ","                     punctuation             "."                     punctuation    
         "Random"                letters                 "Random"                letters        
         "info"                  letters                 "info"                  letters        
         "still"                 letters                 "still"                 letters        
         "continues"             letters                 "continues"             letters        
         "."                     punctuation             "."                     punctuation    
         <missing>               letters                 "ExtraWord"             letters        
         <missing>               letters                 "ExtraWord"             letters        
         <missing>               letters                 "ExtraWord"             letters   

这篇关于MATLAB:表的行数相等或字符串的字词相等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆