R:如何计算一列中用逗号分隔的所有字符值? [英] R: How to Count All Character Values Separated By Commas In A Column?

查看:37
本文介绍了R:如何计算一列中用逗号分隔的所有字符值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我正在使用的一些测试数据的两行.我想算一下ICD10Code列中由列分隔的所有字符.从段在下面的代码中,我使用了group_by,因为每个"PatientId"值在该列中都有重复项,但有其他列中的唯一值.我该如何计算所有字符值的频率?

Below is a couple of rows of some test data I am using. I am wanting to count the frequency of all the characters in the ICD10Code column which are separated by columns. From the segment of code below, I used group_by because every "PatientId" value had duplicates in that column but had unique values in other columns. How can I go about counting the frequency of all character values?

PatientId ReferralSource     NextAppt   Age InsuranceName            ICD10Code
1584      St Francis         Y       34 SLIDING FEE SCHEDULE     M5136, N809, R51, Z6831  
2655      Piedmont Hospital  Y       60 Medicaid-GA (Medicaid)   E119, E782, I10, L729, R809

结果如下所示.

M5136=1
N809=1
R51=1

对R相当陌生,我尝试了在Stack中找到的这段代码(适用),并产生了一个每行特定行的总计数.

Being fairly new to R, I tried this segment of code found in Stack (sapply) and just produced a total count for each row specific row.

data.id <- data.1 %>% group_by(PatientId) %>%  
      summarise(ReferralSource=first(ReferralSource),NextAppt=first(NextAppt),
      Age=max(Age),InsuranceName=toString(unique(InsuranceName)),
      ICD10Code=toString(unique(ICD10Code)))
sapply(strsplit(data.id$ICD10Code,","),FUN=function(x){length(x[x!="Null"])})

得出每一行的总数.

 [1] 10 17  5 18  6  5  8  7  2  8  3  8 10 14  5  5  9  8 11  5  6  5  9 16  9  4  3  9 18  9 12 
  12 12  2 16  6 10
   [38]  2  2  3  4  9  7 12  5 10 16 13  9  1  6  2  7  9  8  5  5  4  3 11 19  6  4  3  7  8  6 
  10  8  6 16 11  5  9
   [75] 13  5  8  4 10  3  7  5  6  4  3  4  8  7  7  4  5  9  2  6  1 20  3  3  3  4  5  5  7  3 
  12  7 16  1  7  6  3
  [112]  4  2  7  8  4  1  9  3  8  3  8  5  8  2  4  4  8  4  7 10  8  2  4  4  2  9  7  7  5  1  
  8  6 10  9  3 11 10
  [149]  3  6  4  6 13  3  7 11  6  5  4  3  1  4 10 10 10 10 11  2  1  5  4  5  5  5  5  9  5  7  
  7  2 6  7  7  6  5
 [186]  7  8  9     

推荐答案

要计算整列中 ICD10Code 的频率,我们可以将字符串拆分为逗号, unlist 并使用 table 进行计数.

To count the frequency of ICD10Code in the entire column, we can split the string on comma, unlist it and count it with table.

table(unlist(strsplit(as.character(data.1$ICD10Code), ',')))

这篇关于R:如何计算一列中用逗号分隔的所有字符值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆