如何将每列中的字符转换为子列而不重复 [英] how to covert character within each column as sub-column without duplication
问题描述
我有一个像这样的 data.frame 文件:输入:
I have a data.frame file like this: input:
1 200 444 444
2 310 NA 444
3 310 NA 444
4 NA 444 444
5 200 444 444
6 200 NA 444
7 310 444 444
8 310 876 444
9 310 876 444
10 NA 876 444
我想将每列中的 ecah 字符转换为子列,并且我想在行中放置 1 或零,以表示是否在该特定行中观察到子列:输出数据框:
I want to convert ecah character within each column as a sub-column and I want to put either 1 or zero in rows in the way that they represent if the the sub column was observed in that specific row or not: Output data.frame :
c1.200 c1.310 c2.444 c2.876 c3.444
1 1 0 1 0 1
2 0 1 0 0 1
3 0 1 0 0 1
4 0 0 1 0 1
5 1 0 1 0 1
6 1 0 0 0 1
7 0 1 1 0 1
8 0 1 0 1 1
9 0 1 0 1 1
10 0 0 0 1 1
R 中有什么解决方案可以做到这一点吗?同时,我的真实数据有 117000 行和 10,000 列.
is there any solution in R to do this? Meanwhile, my real data had 117000 rows and 10,000 columns.
推荐答案
我们可以使用 base R
中的 table
来做到这一点.我们unlist
数据集,粘贴
以c
开头的新列名,使用NA
删除NA
元素code>is.na,得到带有行序列和paste
向量的table
.
We could do this using table
from base R
. We unlist
the dataset, paste
with the new column names that start with c
, remove the NA
elements using is.na
, get the table
with the sequence of rows and the paste
vector.
nm1 <- paste0('c', 1:3, '.')[col(dat)]
v1 <- unlist(dat)
i1 <- !is.na(v1)
newdat <- as.data.frame.matrix(table((1:nrow(dat))[row(dat)][i1],
paste0(nm1[i1], v1[i1])))
newdat
# c1.200 c1.310 c2.444 c2.876 c3.444
# 1 1 0 1 0 1
# 2 0 1 0 0 1
# 3 0 1 0 0 1
# 4 0 0 1 0 1
# 5 1 0 1 0 1
# 6 1 0 0 0 1
# 7 0 1 1 0 1
# 8 0 1 0 1 1
# 9 0 1 0 1 1
# 10 0 0 0 1 1
这篇关于如何将每列中的字符转换为子列而不重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!