如何格式化阴阳图的数据 [英] How to format data for plotly sunburst diagram

查看:171
本文介绍了如何格式化阴阳图的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过R使用Plotly制作森伯斯特图.我正在为层次结构所需的数据模型苦苦挣扎,无论是在概念上如何工作,以及是否有任何简单的方法来转换层次结构.常规数据框,其中的列表示不同的层次结构,格式为所需的格式.

I'm trying to make an sunburst diagram using Plotly via R. I'm struggling with the data model required for the hierarchy, both in terms of conceptualizing how it works, and seeing if there are any easy ways to transform a regular dataframe, with columns representing different hierarchical levels, into the format needed.

我看过R中的阴阳图的示例,例如此处 ,并看到了参考页,但是并没有完全获得用于数据格式化的模型.

I've looked at examples for plotly sunburst charts in R, e.g., here, and seen the reference page but don't totally get the model for data formatting.

# Create some fake data - say ownership and land use data with acreage
df <- data.frame(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)), 
                 landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                 acres=c(108,143,102, 300,320,500, 37,58,90))

# Just try some quick pie charts of acreage by landuse and ownership
plot_ly(data=df, labels= ~landuse, values= ~acres, type='pie')
plot_ly(data=df, labels= ~ownership, values= ~acres, type='pie')

# This doesn't render anything... not that I'd expect it to given the data format doesn't seem to match what's needed, 
# but this is what I'd intuitively expect to work
plot_ly(data=df, labels= ~landuse, parents = ~ownership, values= ~acres, type='sunburst')

鉴于上面的示例代码或类似的示例,查看数据如何从数据(df)转换为可绘制的旭日形图所需的格式会很有帮助.

It would be helpful, given the example code above, or similar, to see how one might go from the data (df) to the format required for the plotly sunburst diagram.

推荐答案

与plotly的R API的其他直观用法相比,您绝对正确,为日照图表准备数据非常烦人.

You are absolutely right, compared to the rest of the intuitiv usage of plotly's R API preparing data for a sunburst chart is rather annoying.

我遇到了同样的问题,并基于library(data.table)编写了一个函数来准备数据,接受两种不同的data.frame输入格式.

I had the same problem and wrote a function based on library(data.table) to prepare the data, accepting two different data.frame input formats.

这里 带有重复标签的朝阳部分.

对于您的示例,它应如下所示:

For your example it should look like this:

         labels values         parents                           ids
 1:       total   1658            <NA>                         total
 2:     private    353           total               total - private
 3:      public   1120           total                total - public
 4:       mixed    185           total                 total - mixed
 5: residential    108 total - private total - private - residential
 6:  recreation    143 total - private  total - private - recreation
 7:  commercial    102 total - private  total - private - commercial
 8: residential    300  total - public  total - public - residential
 9:  recreation    320  total - public   total - public - recreation
10:  commercial    500  total - public   total - public - commercial
11: residential     37   total - mixed   total - mixed - residential
12:  recreation     58   total - mixed    total - mixed - recreation
13:  commercial     90   total - mixed    total - mixed - commercial

这是到达那里的代码:

library(data.table)
library(plotly)

DF <- data.table(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)),
                  landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                  acres=c(108, 143, 102, 300, 320, 500, 37, 58, 90))

as.sunburstDF <- function(DF, valueCol = NULL){
  require(data.table)

  DT <- data.table(DF, stringsAsFactors = FALSE)
  DT[, root := "total"]
  setcolorder(DT, c("root", names(DF)))

  hierarchyList <- list()
  if(!is.null(valueCol)){setnames(DT, valueCol, "values", skip_absent=TRUE)}
  hierarchyCols <- setdiff(names(DT), "values")

  for(i in seq_along(hierarchyCols)){
    currentCols <- names(DT)[1:i]
    if(is.null(valueCol)){
      currentDT <- unique(DT[, ..currentCols][, values := .N, by = currentCols], by = currentCols)
    } else {
      currentDT <- DT[, lapply(.SD, sum, na.rm = TRUE), by=currentCols, .SDcols = "values"]
    }
    setnames(currentDT, length(currentCols), "labels")
    hierarchyList[[i]] <- currentDT
  }

  hierarchyDT <- rbindlist(hierarchyList, use.names = TRUE, fill = TRUE)

  parentCols <- setdiff(names(hierarchyDT), c("labels", "values", valueCol))
  hierarchyDT[, parents := apply(.SD, 1, function(x){fifelse(all(is.na(x)), yes = NA_character_, no = paste(x[!is.na(x)], sep = ":", collapse = " - "))}), .SDcols = parentCols]
  hierarchyDT[, ids := apply(.SD, 1, function(x){paste(x[!is.na(x)], collapse = " - ")}), .SDcols = c("parents", "labels")]
  hierarchyDT[, c(parentCols) := NULL]
  return(hierarchyDT)
}

sunburstDF <- as.sunburstDF(DF, valueCol = "acres")

plot_ly(data = sunburstDF, ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

以下是函数接受的第二个data.frame格式的示例(valueCol = NULL,因为它是根据数据计算得出的):

Here is an example for the second data.frame format accepted by the function (valueCol = NULL, because it is calculated from the data):

DF2 <- data.frame(sample(LETTERS[1:3], 100, replace = TRUE),
                 sample(LETTERS[4:6], 100, replace = TRUE),
                 sample(LETTERS[7:9], 100, replace = TRUE),
                 sample(LETTERS[10:12], 100, replace = TRUE),
                 sample(LETTERS[13:15], 100, replace = TRUE),
                 stringsAsFactors = FALSE)

plot_ly(data = as.sunburstDF(DF2), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

另请参见library( sunburstR ).

Please also see library(sunburstR) as an alternative.

这篇关于如何格式化阴阳图的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆