如何为 plotly sunburst 图格式化数据 [英] How to format data for plotly sunburst diagram

查看:24
本文介绍了如何为 plotly sunburst 图格式化数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Plotly via R 制作一个旭日形图.我正在努力解决层次结构所需的数据模型,无论是在概念化它的工作原理方面,还是看看是否有任何简单的方法来转换常规数据框,其中的列代表不同的层级,转换为所需的格式.

我已经查看了 R 中的 plotly sunburst 图表的示例,例如,

这里是函数接受的第二种data.frame格式的示例(value_column = NULL,因为它是从数据中计算出来的):

DF2 <- data.frame(sample(LETTERS[1:3], 100, replace = TRUE),样本(字母 [4:6],100,替换 = 真),样本(字母[7:9],100,替换=真),样本(字母[10:12],100,替换=真),样本(字母[13:15],100,替换=真),字符串AsFactors = FALSE)plot_ly(data = as.sunburstDF(DF2, add_root = TRUE), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

另请参阅库 (sunburstR).


添加了关于 library(plotme) 中基于 dplyr 的 count_to_sunburst() 函数的基准(见下文),这在我的系统比 data.table 版本慢大约 5 倍.

单位:毫秒expr min lq mean 中位数 uq max neval绘图 50.4618 53.09425 60.92404 55.37815 63.62315 122.3842 100ismirsehregal 8.6553 10.28870 12.63881 11.53760 12.26620 108.2025 100


重现基准的代码:

# devtools::install_github("yogevherz/plotme")库(微基准)图书馆(plotme)图书馆(dplyr)库(数据表)图书馆(情节)DF <- data.frame(ownership=c(rep(private", 3), rep(public",3),rep(mixed", 3)),土地使用=c(rep(c(住宅",娱乐",商业"),3)),英亩=c(108, 143, 102, 300, 320, 500, 37, 58, 90))as.sunburstDF <- 函数(DF,value_column = NULL,add_root = FALSE){需要(数据表)colNamesDF <- 名称(DF)if(is.data.table(DF)){DT <- 复制(DF)} 别的 {DT <- data.table(DF, stringsAsFactors = FALSE)}如果(添加根){DT[, root := 总"]}colNamesDT <- 名称(DT)hierarchy_columns <- setdiff(colNamesDT, value_column)DT[, (hierarchy_columns) := lapply(.SD, as.factor), .SDcols = hierarchy_columns]if(is.null(value_column) && add_root){setcolorder(DT,c(根",colNamesDF))} else if(!is.null(value_column) && !add_root) {setnames(DT,value_column,值",skip_absent=TRUE)setcolorder(DT,c(setdiff(colNamesDF,value_column),值"))} else if(!is.null(value_column) && add_root) {setnames(DT,value_column,值",skip_absent=TRUE)setcolorder(DT, c(root", setdiff(colNamesDF, value_column), values"))}层次结构列表 <- 列表()for(i in seq_along(hierarchy_columns)){current_columns <- colNamesDT[1:i]if(is.null(value_column)){currentDT <- unique(DT[, ..current_columns][, values := .N, by = current_columns], by = current_columns)} 别的 {currentDT <- DT[, lapply(.SD, sum, na.rm = TRUE), by=current_columns, .SDcols = 值"]}setnames(currentDT,长度(current_columns),标签")hierarchyList[[i]] <- currentDT}hierarchyDT <- rbindlist(hierarchyList, use.names = TRUE, fill = TRUE)parent_columns <- setdiff(names(hierarchyDT), c(labels", values", value_column))hierarchyDT[, 父母 := apply(.SD, 1, function(x){fifelse(all(is.na(x)), yes = NA_character_, no = paste(x[!is.na(x)], sep=:",折叠=-"))}),.SDcols = parent_columns]hierarchyDT[, ids := apply(.SD, 1, function(x){paste(x[!is.na(x)], collapse = "-")}), .SDcols = c("parents";, 标签")]hierarchyDT[, c(parent_columns) := NULL]返回(层次结构DT)}微基准(plotme = {DF %>%重命名(n = 英亩)%>%count_to_sunburst()}, ismirsehregal = {plot_ly(data = as.sunburstDF(DF, value_column = "acres", add_root = TRUE), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues ='全部的')})

I'm trying to make an sunburst diagram using Plotly via R. I'm struggling with the data model required for the hierarchy, both in terms of conceptualizing how it works, and seeing if there are any easy ways to transform a regular dataframe, with columns representing different hierarchical levels, into the format needed.

I've looked at examples for plotly sunburst charts in R, e.g., here, and seen the reference page but don't totally get the model for data formatting.

# Create some fake data - say ownership and land use data with acreage
df <- data.frame(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)), 
                 landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                 acres=c(108,143,102, 300,320,500, 37,58,90))

# Just try some quick pie charts of acreage by landuse and ownership
plot_ly(data=df, labels= ~landuse, values= ~acres, type='pie')
plot_ly(data=df, labels= ~ownership, values= ~acres, type='pie')

# This doesn't render anything... not that I'd expect it to given the data format doesn't seem to match what's needed, 
# but this is what I'd intuitively expect to work
plot_ly(data=df, labels= ~landuse, parents = ~ownership, values= ~acres, type='sunburst')

It would be helpful, given the example code above, or similar, to see how one might go from the data (df) to the format required for the plotly sunburst diagram.

解决方案

You are absolutely right, compared to the rest of the intuitiv usage of plotly's R API preparing data for a sunburst (or treemap) chart is rather annoying.

I had the same problem and wrote a function based on library(data.table) to prepare the data, accepting two different data.frame input formats.

The format required to generate a sunburst plot using data similarly structured as yours can be seen here under the section Sunburst with Repeated Labels.

For your example it should look like this:

         labels values         parents                           ids
 1:       total   1658            <NA>                         total
 2:     private    353           total               total - private
 3:      public   1120           total                total - public
 4:       mixed    185           total                 total - mixed
 5: residential    108 total - private total - private - residential
 6:  recreation    143 total - private  total - private - recreation
 7:  commercial    102 total - private  total - private - commercial
 8: residential    300  total - public  total - public - residential
 9:  recreation    320  total - public   total - public - recreation
10:  commercial    500  total - public   total - public - commercial
11: residential     37   total - mixed   total - mixed - residential
12:  recreation     58   total - mixed    total - mixed - recreation
13:  commercial     90   total - mixed    total - mixed - commercial

Here is the code to get there:

library(data.table)
library(plotly)

DF <- data.table(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)),
                 landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                 acres=c(108, 143, 102, 300, 320, 500, 37, 58, 90))

as.sunburstDF <- function(DF, value_column = NULL, add_root = FALSE){
  require(data.table)
  
  colNamesDF <- names(DF)
  
  if(is.data.table(DF)){
    DT <- copy(DF)
  } else {
    DT <- data.table(DF, stringsAsFactors = FALSE)
  }
  
  if(add_root){
    DT[, root := "Total"]  
  }
  
  colNamesDT <- names(DT)
  hierarchy_columns <- setdiff(colNamesDT, value_column)
  DT[, (hierarchy_columns) := lapply(.SD, as.factor), .SDcols = hierarchy_columns]
  
  if(is.null(value_column) && add_root){
    setcolorder(DT, c("root", colNamesDF))
  } else if(!is.null(value_column) && !add_root) {
    setnames(DT, value_column, "values", skip_absent=TRUE)
    setcolorder(DT, c(setdiff(colNamesDF, value_column), "values"))
  } else if(!is.null(value_column) && add_root) {
    setnames(DT, value_column, "values", skip_absent=TRUE)
    setcolorder(DT, c("root", setdiff(colNamesDF, value_column), "values"))
  }
  
  hierarchyList <- list()
  
  for(i in seq_along(hierarchy_columns)){
    current_columns <- colNamesDT[1:i]
    if(is.null(value_column)){
      currentDT <- unique(DT[, ..current_columns][, values := .N, by = current_columns], by = current_columns)
    } else {
      currentDT <- DT[, lapply(.SD, sum, na.rm = TRUE), by=current_columns, .SDcols = "values"]
    }
    setnames(currentDT, length(current_columns), "labels")
    hierarchyList[[i]] <- currentDT
  }
  
  hierarchyDT <- rbindlist(hierarchyList, use.names = TRUE, fill = TRUE)
  
  parent_columns <- setdiff(names(hierarchyDT), c("labels", "values", value_column))
  hierarchyDT[, parents := apply(.SD, 1, function(x){fifelse(all(is.na(x)), yes = NA_character_, no = paste(x[!is.na(x)], sep = ":", collapse = " - "))}), .SDcols = parent_columns]
  hierarchyDT[, ids := apply(.SD, 1, function(x){paste(x[!is.na(x)], collapse = " - ")}), .SDcols = c("parents", "labels")]
  hierarchyDT[, c(parent_columns) := NULL]
  return(hierarchyDT)
}


sunburstDF <- as.sunburstDF(DF, value_column = "acres", add_root = TRUE)

plot_ly(data = sunburstDF, ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

Here is an example for the second data.frame format accepted by the function (value_column = NULL, because it is calculated from the data):

DF2 <- data.frame(sample(LETTERS[1:3], 100, replace = TRUE),
                  sample(LETTERS[4:6], 100, replace = TRUE),
                  sample(LETTERS[7:9], 100, replace = TRUE),
                  sample(LETTERS[10:12], 100, replace = TRUE),
                  sample(LETTERS[13:15], 100, replace = TRUE),
                  stringsAsFactors = FALSE)

plot_ly(data = as.sunburstDF(DF2, add_root = TRUE), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')

Please also see library(sunburstR) as an alternative.


Edit: Added a benchmark regarding the dplyr based count_to_sunburst() function from library(plotme) (see below), which on my system is around 5 times slower than the data.table version.

Unit: milliseconds
          expr     min       lq     mean   median       uq      max neval
        plotme 50.4618 53.09425 60.92404 55.37815 63.62315 122.3842   100
 ismirsehregal  8.6553 10.28870 12.63881 11.53760 12.26620 108.2025   100


Code to reproduce the benchmark:

# devtools::install_github("yogevherz/plotme")

library(microbenchmark)
library(plotme)
library(dplyr)
library(data.table)
library(plotly)

DF <- data.frame(ownership=c(rep("private", 3), rep("public",3),rep("mixed", 3)),
                 landuse=c(rep(c("residential", "recreation", "commercial"),3)),
                 acres=c(108, 143, 102, 300, 320, 500, 37, 58, 90))

as.sunburstDF <- function(DF, value_column = NULL, add_root = FALSE){
  require(data.table)
  
  colNamesDF <- names(DF)
  
  if(is.data.table(DF)){
    DT <- copy(DF)
  } else {
    DT <- data.table(DF, stringsAsFactors = FALSE)
  }
  
  if(add_root){
    DT[, root := "Total"]  
  }
  
  colNamesDT <- names(DT)
  hierarchy_columns <- setdiff(colNamesDT, value_column)
  DT[, (hierarchy_columns) := lapply(.SD, as.factor), .SDcols = hierarchy_columns]
  
  if(is.null(value_column) && add_root){
    setcolorder(DT, c("root", colNamesDF))
  } else if(!is.null(value_column) && !add_root) {
    setnames(DT, value_column, "values", skip_absent=TRUE)
    setcolorder(DT, c(setdiff(colNamesDF, value_column), "values"))
  } else if(!is.null(value_column) && add_root) {
    setnames(DT, value_column, "values", skip_absent=TRUE)
    setcolorder(DT, c("root", setdiff(colNamesDF, value_column), "values"))
  }
  
  hierarchyList <- list()
  
  for(i in seq_along(hierarchy_columns)){
    current_columns <- colNamesDT[1:i]
    if(is.null(value_column)){
      currentDT <- unique(DT[, ..current_columns][, values := .N, by = current_columns], by = current_columns)
    } else {
      currentDT <- DT[, lapply(.SD, sum, na.rm = TRUE), by=current_columns, .SDcols = "values"]
    }
    setnames(currentDT, length(current_columns), "labels")
    hierarchyList[[i]] <- currentDT
  }
  
  hierarchyDT <- rbindlist(hierarchyList, use.names = TRUE, fill = TRUE)
  
  parent_columns <- setdiff(names(hierarchyDT), c("labels", "values", value_column))
  hierarchyDT[, parents := apply(.SD, 1, function(x){fifelse(all(is.na(x)), yes = NA_character_, no = paste(x[!is.na(x)], sep = ":", collapse = " - "))}), .SDcols = parent_columns]
  hierarchyDT[, ids := apply(.SD, 1, function(x){paste(x[!is.na(x)], collapse = " - ")}), .SDcols = c("parents", "labels")]
  hierarchyDT[, c(parent_columns) := NULL]
  return(hierarchyDT)
}

microbenchmark(plotme = {
  DF %>% 
    rename(n = acres) %>% 
    count_to_sunburst()
}, ismirsehregal = {
  plot_ly(data = as.sunburstDF(DF, value_column = "acres", add_root = TRUE), ids = ~ids, labels= ~labels, parents = ~parents, values= ~values, type='sunburst', branchvalues = 'total')  
})

这篇关于如何为 plotly sunburst 图格式化数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆