从数据帧到顶点/边缘数组 [英] From dataframe to vertex/edge array

查看:115
本文介绍了从数据帧到顶点/边缘数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数据框

  test<  -  structure(list(
y2002 = c( ,新生,大二,大二,大二,大二,大二,大二 高级),
y2004 = c(初级,大二,大二,高级,高级,NA),
y2005 = c ,高级,NA,NA,NA)),
.Names = c(2002,2003,2004,2005),
row.names = c c(1:6)),
class =data.frame)
>测试
2002 2003 2004 2005
1新生大一初级高级
2新生大二学生高二
3新生大二学生二年级高级
4大二二年级高年级< NA>
5大二二年级高中< NA>
6资深高级< NA> < NA>

,我需要每次学生创建一个顶点/边缘列表(用于igraph)类别连续变化,而忽略何时没有变化,如

  testvertices<  -  structure(list(
顶点=
c(新手,初级,大一,初级,大二,新生,
初级,大二大二,
edge =
c(初级,高级,初级,大二,高级,初级,
大二 ,高级,高级),
id =
c(1,1,2,2,2,3,3 3,4,5)),
.Names = c(vertex,edge,id),
row.names = c(1:10)
class =data.frame)
> testvertices
顶点边缘id
1大一初级1
2初级高级1
3大一初级2
4初级大二2
5大二高级2
6新生初中3
7初级大二3
8大二高级3
9大二高级4
10大二高级5
/ pre>

在这一点上,我忽略了ids,我的图表应该通过计数加权边缘(即,新生 - >初级= 3)。想法是制作树形图。我知道这是在主要的关键点旁边,但是如果你问... ...

解决方案

如果我正确理解你,你需要这样的东西:

  elist<  -  lapply(seq_len(nrow(test)),function(i){
x< - as.character(test [i,])
x< - unique(na.omit(x))
x< - rep(x,each = 2)
x< - x [-1]
x < - x [-length(x)]
r< - matrix(x,ncol = 2,byrow = TRUE)
if(nrow (r)> 0){r r
})

do.call(rbind,elist)

#i
#[1,]freshmanjunior1
#[2,]junior 高级1
#[3,]新生初级2
#[4,]初级大二2
# ]大二高级2
#[6,]新生初级3
#[7,]初级大二3 #[8,]大二高级3
#[9,]大二seni或4
#[10,]大二高级5

这不是最有效的解决方案,但我认为这是相当的教训。我们为输入矩阵的每一行分别创建边,因此 lapply 。要从一行创建边,我们首先删除NAs和重复,然后将每个顶点包含两次。最后,我们删除第一个和最后一个顶点。这样我们创建了一个边缘列表矩阵,我们只需要放置第一个和最后一个顶点,并将其格式化为两列(实际上,将它作为向量保存起来更为有效)。



添加额外列时,我们必须小心检查边缘列表矩阵是否为零行。



do.call 函数将把所有东西都粘在一起。结果是一个矩阵,您可以通过 as.data.frame()转换为数据框,然后您还可以将第三列转换为数字。如果你喜欢,你也可以更改列名。


I have the dataframe

test <- structure(list(
     y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
     y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"),
     y2004 = c("junior","sophomore","sophomore","senior","senior",NA),
     y2005 = c("senior","senior","senior",NA, NA, NA)), 
              .Names = c("2002","2003","2004","2005"),
              row.names = c(c(1:6)),
              class = "data.frame")
> test
       2002      2003      2004   2005
1  freshman  freshman    junior senior
2  freshman    junior sophomore senior
3  freshman    junior sophomore senior
4 sophomore sophomore    senior   <NA>
5 sophomore sophomore    senior   <NA>
6    senior    senior      <NA>   <NA>

and I need to create a vertex/edge list (for use with igraph) with every time the student category changes in consecutive years, while ignoring when there is no change, as in

testvertices <- structure(list(
 vertex = 
  c("freshman","junior", "freshman","junior","sophomore","freshman",
    "junior","sophomore","sophomore","sophomore"),
 edge = 
  c("junior","senior","junior","sophomore","senior","junior",
    "sophomore","senior","senior","senior"),
 id =
  c("1","1","2","2","2","3","3","3","4","5")),
                       .Names = c("vertex","edge", "id"),
                       row.names = c(1:10),
                       class = "data.frame")
> testvertices
      vertex      edge id
1   freshman    junior  1
2     junior    senior  1
3   freshman    junior  2
4     junior sophomore  2
5  sophomore    senior  2
6   freshman    junior  3
7     junior sophomore  3
8  sophomore    senior  3
9  sophomore    senior  4
10 sophomore    senior  5

At this point I'm ignoring the ids, my graph should weight edges by count (i.e., freshman -> junior =3). The idea is to make a tree graph. I know it is beside the main munging point, but that's in case you ask...

解决方案

If I understand you correctly, you need something like this:

elist <- lapply(seq_len(nrow(test)), function(i) {
  x <- as.character(test[i,])
  x <- unique(na.omit(x))
  x <- rep(x, each=2)
  x <- x[-1]
  x <- x[-length(x)]
  r <- matrix(x, ncol=2, byrow=TRUE)
  if (nrow(r) > 0) { r <- cbind(r, i) } else { r <- cbind(r, numeric()) }
  r
})

do.call(rbind, elist)

#                              i  
# [1,] "freshman"  "junior"    "1"
# [2,] "junior"    "senior"    "1"
# [3,] "freshman"  "junior"    "2"
# [4,] "junior"    "sophomore" "2"
# [5,] "sophomore" "senior"    "2"
# [6,] "freshman"  "junior"    "3"
# [7,] "junior"    "sophomore" "3"
# [8,] "sophomore" "senior"    "3"
# [9,] "sophomore" "senior"    "4"
#[10,] "sophomore" "senior"    "5"

It is not the most efficient solution, but I think it is fairly didactic. We create edges separately for each row of your input matrix, hence the lapply. To create the edges from a row, we first remove NAs and duplicates, and then include each vertex twice. Finally, we remove the first and last vertex. This way we created an edge list matrix, we only need to drop the first and last vertex and format it in two columns (actually it would be more efficient to leave it as a vector, never mind).

When adding the extra column, we must be careful to check whether our edge list matrix has zero rows.

The do.call function will just glue everything together. The result is a matrix, which you can convert to a data frame if you like, via as.data.frame(), and then you can also convert the third column to numeric. You can also change the column names if you like.

这篇关于从数据帧到顶点/边缘数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆