从数据帧到顶点/边缘数组 [英] From dataframe to vertex/edge array
问题描述
我有数据框
test< - structure(list(
y2002 = c( ,新生,大二,大二,大二,大二,大二,大二 高级),
y2004 = c(初级,大二,大二,高级,高级,NA),
y2005 = c ,高级,NA,NA,NA)),
.Names = c(2002,2003,2004,2005),
row.names = c c(1:6)),
class =data.frame)
>测试
2002 2003 2004 2005
1新生大一初级高级
2新生大二学生高二
3新生大二学生二年级高级
4大二二年级高年级< NA>
5大二二年级高中< NA>
6资深高级< NA> < NA>
,我需要每次学生创建一个顶点/边缘列表(用于igraph)类别连续变化,而忽略何时没有变化,如
testvertices< - structure(list(
/ pre>
顶点=
c(新手,初级,大一,初级,大二,新生,
初级,大二大二,
edge =
c(初级,高级,初级,大二,高级,初级,
大二 ,高级,高级),
id =
c(1,1,2,2,2,3,3 3,4,5)),
.Names = c(vertex,edge,id),
row.names = c(1:10)
class =data.frame)
> testvertices
顶点边缘id
1大一初级1
2初级高级1
3大一初级2
4初级大二2
5大二高级2
6新生初中3
7初级大二3
8大二高级3
9大二高级4
10大二高级5
在这一点上,我忽略了ids,我的图表应该通过计数加权边缘(即,新生 - >初级= 3)。想法是制作树形图。我知道这是在主要的关键点旁边,但是如果你问... ...
解决方案如果我正确理解你,你需要这样的东西:
elist< - lapply(seq_len(nrow(test)),function(i){
x< - as.character(test [i,])
x< - unique(na.omit(x))
x< - rep(x,each = 2)
x< - x [-1]
x < - x [-length(x)]
r< - matrix(x,ncol = 2,byrow = TRUE)
if(nrow (r)> 0){rr
})
do.call(rbind,elist)
#i
#[1,]freshmanjunior1
#[2,]junior 高级1
#[3,]新生初级2
#[4,]初级大二2
# ]大二高级2
#[6,]新生初级3
#[7,]初级大二3 #[8,]大二高级3
#[9,]大二seni或4
#[10,]大二高级5
这不是最有效的解决方案,但我认为这是相当的教训。我们为输入矩阵的每一行分别创建边,因此
lapply
。要从一行创建边,我们首先删除NAs和重复,然后将每个顶点包含两次。最后,我们删除第一个和最后一个顶点。这样我们创建了一个边缘列表矩阵,我们只需要放置第一个和最后一个顶点,并将其格式化为两列(实际上,将它作为向量保存起来更为有效)。
添加额外列时,我们必须小心检查边缘列表矩阵是否为零行。
do.call
函数将把所有东西都粘在一起。结果是一个矩阵,您可以通过as.data.frame()
转换为数据框,然后您还可以将第三列转换为数字。如果你喜欢,你也可以更改列名。I have the dataframe
test <- structure(list( y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"), y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"), y2004 = c("junior","sophomore","sophomore","senior","senior",NA), y2005 = c("senior","senior","senior",NA, NA, NA)), .Names = c("2002","2003","2004","2005"), row.names = c(c(1:6)), class = "data.frame") > test 2002 2003 2004 2005 1 freshman freshman junior senior 2 freshman junior sophomore senior 3 freshman junior sophomore senior 4 sophomore sophomore senior <NA> 5 sophomore sophomore senior <NA> 6 senior senior <NA> <NA>
and I need to create a vertex/edge list (for use with igraph) with every time the student category changes in consecutive years, while ignoring when there is no change, as in
testvertices <- structure(list( vertex = c("freshman","junior", "freshman","junior","sophomore","freshman", "junior","sophomore","sophomore","sophomore"), edge = c("junior","senior","junior","sophomore","senior","junior", "sophomore","senior","senior","senior"), id = c("1","1","2","2","2","3","3","3","4","5")), .Names = c("vertex","edge", "id"), row.names = c(1:10), class = "data.frame") > testvertices vertex edge id 1 freshman junior 1 2 junior senior 1 3 freshman junior 2 4 junior sophomore 2 5 sophomore senior 2 6 freshman junior 3 7 junior sophomore 3 8 sophomore senior 3 9 sophomore senior 4 10 sophomore senior 5
At this point I'm ignoring the ids, my graph should weight edges by count (i.e., freshman -> junior =3). The idea is to make a tree graph. I know it is beside the main munging point, but that's in case you ask...
解决方案If I understand you correctly, you need something like this:
elist <- lapply(seq_len(nrow(test)), function(i) { x <- as.character(test[i,]) x <- unique(na.omit(x)) x <- rep(x, each=2) x <- x[-1] x <- x[-length(x)] r <- matrix(x, ncol=2, byrow=TRUE) if (nrow(r) > 0) { r <- cbind(r, i) } else { r <- cbind(r, numeric()) } r }) do.call(rbind, elist) # i # [1,] "freshman" "junior" "1" # [2,] "junior" "senior" "1" # [3,] "freshman" "junior" "2" # [4,] "junior" "sophomore" "2" # [5,] "sophomore" "senior" "2" # [6,] "freshman" "junior" "3" # [7,] "junior" "sophomore" "3" # [8,] "sophomore" "senior" "3" # [9,] "sophomore" "senior" "4" #[10,] "sophomore" "senior" "5"
It is not the most efficient solution, but I think it is fairly didactic. We create edges separately for each row of your input matrix, hence the
lapply
. To create the edges from a row, we first remove NAs and duplicates, and then include each vertex twice. Finally, we remove the first and last vertex. This way we created an edge list matrix, we only need to drop the first and last vertex and format it in two columns (actually it would be more efficient to leave it as a vector, never mind).When adding the extra column, we must be careful to check whether our edge list matrix has zero rows.
The
do.call
function will just glue everything together. The result is a matrix, which you can convert to a data frame if you like, viaas.data.frame()
, and then you can also convert the third column to numeric. You can also change the column names if you like.这篇关于从数据帧到顶点/边缘数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!