如何创建每行库存的二进制矩阵? (右) [英] How to create a binary matrix of inventory per row? (R)
问题描述
我有一个9列的数据框,其中包含因素清单.每行可以填满所有9列(因为该行包含9个事物"),但是大多数却没有(大多数都在3-4之间).列也不是特定的,就像第1列和第3列中出现了第200项一样.我想为每个包含所有因素的行创建一个二进制矩阵.
I have a dataframe of 9 columns consisting of an inventory of factors. Each row can have all 9 columns filled (as in that row is holding 9 "things"), but most don't (most have between 3-4). The columns aren't specific either, as in if item 200 shows up in columns 1 and 3, it's the same thing. I'd like to create a matrix that is binary for each row that includes all factors.
Ex(缩短到4列只是为了指出一点)
Ex (shortened to 4 columns just to get point across)
R1 3 4 5 8
R2 4 6 7 NA
R3 1 5 NA NA
R4 2 6 8 9
应该变成
1 2 3 4 5 6 7 8 9
r1 0 0 1 1 1 0 0 1 0
r2 0 0 0 1 0 1 1 0 0
r3 1 0 0 0 1 0 0 0 0
r4 0 1 0 0 0 1 0 1 1
我研究过writeBin/readBin,K聚类(这是我想做的事情,但我需要先消除NA),模糊聚类,标签聚类.只是有点迷失方向.
I've looked into writeBin/readBin, K-clustering (which is something I'd like to do, but I need to get rid of the NAs first), fuzzy clustering, tag clustering. Just kinda lost about what direction to go.
我尝试编写两个for循环,分别按列/行从矩阵中提取数据,然后分别在新矩阵中保存0和1,但是我认为存在范围问题.
I've tried writing two for loops that pull the data from the matrix by column/row and then save 0s and 1s respectively in a new matrix, but I think there were scope issues.
你们是最棒的.谢谢!
推荐答案
这是基本的R解决方案:
Here's a base R solution:
# Read in the data, and convert to matrix form
df <- read.table(text = "
3 4 5 8
4 6 7 NA
1 5 NA NA
2 6 8 9", header = FALSE)
m <- as.matrix(df)
# Create a two column matrix containing row/column indices of cells to be filled
# with 'one's
id <- cbind(rowid = as.vector(t(row(m))),
colid = as.vector(t(m)))
id <- id[complete.cases(id), ]
# Create output matrix
out <- matrix(0, nrow = nrow(m), ncol = max(m, na.rm = TRUE))
out[id] <- 1
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 0 0 1 1 1 0 0 1 0
# [2,] 0 0 0 1 0 1 1 0 0
# [3,] 1 0 0 0 1 0 0 0 0
# [4,] 0 1 0 0 0 1 0 1 1
这篇关于如何创建每行库存的二进制矩阵? (右)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!