R中的比例树形图 [英] Proportions tree graph in R

查看:245
本文介绍了R中的比例树形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要构建一个算法,给定由n个因子组成的 data.frame ,返回一个树形图,其中每个节点表示一个因子的级别,而按照该因子级别和上级节点级别划分的行比例(例如,每个节点可以显示:factorX.levelY = 30%)。

第一个节点将表示总行数,并将作为基数(100)。树的第二层将有k个节点,它们将对应于第一个因子的k个层次,第三层将有k * m个节点,其中m将是第二个因子的层次。等等。



用作函数输入的'data.frame'将按照可用作节点层次结构的方式对列进行排序。例如, data [,1] 将是树中的上层因子, data [,2] 和等等。



以下是一个用作输入的 data.frame 示例:

  df <-data.frame(f1 = factor(rep(LETTERS [1:2],each = 50)),
f2 = rep(字母[1:4],每个= 25),
f3 = rep(colors(1)[1:2],25,each = 2))

该图看起来像这样,但是前面指出的节点内的格式为:(factorX.levelY = 30%)





我注意到 rpart 包可以生成类似的图,但唯一接受的输入是模型对象类型。

解决方案

这是一个递归方法。首先,有一个函数来构建树结构,将每个拆分级别的比例收集到一个命名的嵌套列表中。其次,有一个函数可以将嵌套列表转换为边界列表,以便与 igraph 一起使用。最后, igraph 提供绘图功能。

  ##创建树结构在嵌套列表中
makePtree< - 函数(data,prev = 1){
tab < - (t < - table(data [,1L]))[t> 0] / nrow数据)* prev#计算当前级别的比例
ns < - sprintf(%s。%s =%。2f,names(data)[1L],names(tab),unname(c(tab )))#names
if(NCOL(data)< 2L)return(ns)#我们完成了,只返回名字
setNames(mapply(makePtree,split(data [, - 1L,drop = F],data [,1L],drop = T),
tab,SIMPLIFY = F),ns)#recurse
}

##从嵌套列表创建边界列表对于igraph :: graph_from_data_frame
lst2edge< - function(lst){
if(!is.list(lst))return(data.frame(a = character(0),b = character(0 ))
do.call(rbind,
c(lapply(names(lst),function(x){
if (!is.list(lst [[x]]))return(data.frame(a = x,b = lst [[x]]))
data.frame(a = x,b = names( lst [[x]]))
}),lapply(lst,lst2edge)))
}

## Apply函数
lst < - makePtree df)#嵌套列表
dat <-lst2edge(lst)#edgelist
dat <-rbind(dat,data.frame(a =root,b = names(lst)))##添加根节点

##制作一个igraph
库(igraph)
g < - graph_from_data_frame(dat)
plot(g,layout = layout.reingold。 tilford(g,root =root))



您可以调整绘图函数的 vertex.label.degree 参数的顶点标签的位置。


I need to build an algorithm that, given a data.frame made up of n factors, returns a tree graph where each node represents a level of a factor and the proportion of rows classified by the level of that factor and by the level of the upper nodes (for example, each node could display: factorX.levelY=30%).

The first node will represent the total number of rows and will be the base (100). The second level of the tree will have k nodes that will correspond to the k levels of the first factor, the third level will have k*m nodes, where m will be the levels of the second factor. And so on.

The 'data.frame' used as input for the function will have its columns ordered in a way that will serve as the hierarchy of the nodes. For instance, data[,1] will be the upper level factor in the tree, data[,2] and so on.

Here's an example of the data.frame that would be used as input:

 df<-data.frame( f1=factor( rep( LETTERS[1:2], each=50)),  
                 f2=rep( letters[1:4], each=25),
                 f3=rep( colors(1)[1:2], 25, each=2))

The graph would look like these, but with the format inside the nodes indicated before: (factorX.levelY=30%)

I've noticed that the rpart package can produce similar graphs, but the only input that functions accept is a model object type.

解决方案

Here is a recursive approach. First, there is a function to build the tree structure, gathering the proportions at each split level into a named, nested list. Second, there is a function to convert the nested list to an edgelist to use with igraph. Lastly, igraph provides the plotting capability.

## Create tree structure in nested list
makePtree <- function(data, prev=1) {
    tab <- (t <- table(data[,1L]))[t>0] / nrow(data)*prev                     # calculate proportions at current level
    ns <- sprintf("%s.%s=%.2f", names(data)[1L], names(tab), unname(c(tab)))  # names
    if (NCOL(data) < 2L) return( ns )                                         # we are done, return names only
    setNames(mapply(makePtree, split(data[,-1L,drop=F], data[,1L], drop=T),
                    tab, SIMPLIFY = F), ns)                                   # recurse
}

## Create edgelist from nested list for igraph::graph_from_data_frame
lst2edge <- function(lst) {
    if (!is.list(lst)) return( data.frame(a=character(0), b=character(0)) )
    do.call(rbind,
            c(lapply(names(lst), function(x) {
                if (!is.list(lst[[x]])) return( data.frame(a=x, b=lst[[x]]) )
                data.frame(a=x, b=names(lst[[x]]))
            }), lapply(lst, lst2edge)))
}

## Apply functions
lst <- makePtree(df)                                   # nested list
dat <- lst2edge(lst)                                   # edgelist
dat <- rbind(dat, data.frame(a="root", b=names(lst)))  # add a root node 

## Make an igraph
library(igraph)
g <- graph_from_data_frame(dat)
plot(g, layout=layout.reingold.tilford(g, root="root"))

If you wanted the final nodes to be represented separately you could alter their names so igraph points to them separately. Here, I modified the lst2edge function to produce longer names for the final level. Then use some regex to shorten them for the final figure.

## Create edgelist from nested list for igraph::graph_from_data_frame
lst2edge <- function(lst) {
    if (!is.list(lst)) return( data.frame(a=character(0), b=character(0)) )
    do.call(rbind,
            c(lapply(names(lst), function(x) {
                if (!is.list(lst[[x]])) return( data.frame(a=x, b=paste0(x, lst[[x]])) )
                data.frame(a=x, b=names(lst[[x]]))
            }), lapply(lst, lst2edge)))
}

## Apply functions
lst <- makePtree(df)                                           # nested list
dat <- lst2edge(lst)                                           # edgelist
dat <- rbind(dat, data.frame(a="root", b=names(lst)))          # add a root node 

## Make an igraph
g <- graph_from_data_frame(dat)

## Fix the names of the last level (they are lengthened in lst2edge
## so igraph doesn't show multiple incoming arrows to single nodes)
V(g)$name <- gsub(".*?([^\\.]+=[^=]+$)", "\\1", V(g)$name)
plot(g, layout=layout.reingold.tilford(g, root="root"),
     vertex.label.dist=-0.1, vertex.label.degree=c(rep(pi/2, 7), rep(c(pi/2, 3*pi/2), 4)))

You can adjust the position of the vertex labels with vertex.label.degree argument to the plotting function.

这篇关于R中的比例树形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆