R:如何计算列中以逗号分隔的所有字符值? [英] R: How to Count All Character Values Separated By Commas In A Column?

查看:25
本文介绍了R:如何计算列中以逗号分隔的所有字符值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我正在使用的一些测试数据的几行.我想计算频率ICD10Code 列中由列分隔的所有字符.从片段下面的代码,我使用 group_by 因为每个PatientId"值在该列中都有重复但有其他列中的唯一值.如何计算所有字符值的频率?

Below is a couple of rows of some test data I am using. I am wanting to count the frequency of all the characters in the ICD10Code column which are separated by columns. From the segment of code below, I used group_by because every "PatientId" value had duplicates in that column but had unique values in other columns. How can I go about counting the frequency of all character values?

PatientId ReferralSource     NextAppt   Age InsuranceName            ICD10Code
1584      St Francis         Y       34 SLIDING FEE SCHEDULE     M5136, N809, R51, Z6831  
2655      Piedmont Hospital  Y       60 Medicaid-GA (Medicaid)   E119, E782, I10, L729, R809

结果如下所示.

M5136=1
N809=1
R51=1

作为 R 的新手,我尝试了在 Stack (sapply) 中找到的这段代码,并生成了一个每行特定行的总计数.

Being fairly new to R, I tried this segment of code found in Stack (sapply) and just produced a total count for each row specific row.

data.id <- data.1 %>% group_by(PatientId) %>%  
      summarise(ReferralSource=first(ReferralSource),NextAppt=first(NextAppt),
      Age=max(Age),InsuranceName=toString(unique(InsuranceName)),
      ICD10Code=toString(unique(ICD10Code)))
sapply(strsplit(data.id$ICD10Code,","),FUN=function(x){length(x[x!="Null"])})

这产生了每行的总数.

 [1] 10 17  5 18  6  5  8  7  2  8  3  8 10 14  5  5  9  8 11  5  6  5  9 16  9  4  3  9 18  9 12 
  12 12  2 16  6 10
   [38]  2  2  3  4  9  7 12  5 10 16 13  9  1  6  2  7  9  8  5  5  4  3 11 19  6  4  3  7  8  6 
  10  8  6 16 11  5  9
   [75] 13  5  8  4 10  3  7  5  6  4  3  4  8  7  7  4  5  9  2  6  1 20  3  3  3  4  5  5  7  3 
  12  7 16  1  7  6  3
  [112]  4  2  7  8  4  1  9  3  8  3  8  5  8  2  4  4  8  4  7 10  8  2  4  4  2  9  7  7  5  1  
  8  6 10  9  3 11 10
  [149]  3  6  4  6 13  3  7 11  6  5  4  3  1  4 10 10 10 10 11  2  1  5  4  5  5  5  5  9  5  7  
  7  2 6  7  7  6  5
 [186]  7  8  9     

推荐答案

要统计ICD10Code在整列中出现的频率,我们可以用逗号分割字符串,unlist 并用 table 计算.

To count the frequency of ICD10Code in the entire column, we can split the string on comma, unlist it and count it with table.

table(unlist(strsplit(as.character(data.1$ICD10Code), ',')))

这篇关于R:如何计算列中以逗号分隔的所有字符值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆