数据透视表计算小 Ns 和行百分比 (R) [英] pivottabler Counting Small Ns and Row Percentages (R)

查看:67
本文介绍了数据透视表计算小 Ns 和行百分比 (R)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 pivottabler 包时遇到困难,想知道您是否可以提供帮助.

I was having difficulty with the pivottabler package and was wondering if you could assist.

library(pivottabler)

# perform the aggregation in R code explicitly
trains <- bhmtrains %>%
  group_by(TrainCategory, TOC) %>%
  summarise(NumberOfTrains=n()) %>%
  ungroup()

# display this pre-calculated data
pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains",  # <<  *** CODE CHANGE (AND BELOW) *** <<
                     type="value", valueName="NumberOfTrains", 
                     summariseExpression="sum(NumberOfTrains)")
pt$renderPivot()

这会产生一个很棒的类似枢轴的表格,看起来像这样

This produces a great pivot-like table that looks like this

有谁知道我怎么能像这样添加行列的百分比?

Does anyone know how I can add a percent of row column like this?

我将列添加到我的数据集中,按 TOC 和 Total by TOC &火车类别.我试图从中计算出一个百分比,但是

I added columns to my dataset for total by TOC and Total by TOC & TrainCategory. I tried to get a percentage calculated from that but

#total calculations 
bhmtrains <- bhmtrains %>%
+     group_by(TOC) %>%
+     mutate(TOCCount = n())

bhmtrains <- bhmtrains %>%
+     group_by(TrainCategory) %>%
+     mutate(TrainCategoryCCount = n())

pt <- PivotTable$new()
pt$addData(trains)
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="TotalTrains",  # <<  *** CODE CHANGE (AND BELOW) *** <<
                     type="value", valueName="NumberOfTrains", 
                     summariseExpression="sum(NumberOfTrains)")
##my attempt to calculate row percentage
pt$defineCalculation(calculationName="Percent", caption="%", 
                     type="calculation", basedOn=c("TOCCount", "TrainCategoryCCount"), 
                     format="%.1f %%",
                     calculationExpression="values$TOCCount/values$TrainCategoryCCount*100")    
pt$renderPivot()

我收到了这个错误:

rror in if (calc$type == "value") { : argument is of length zero

有人可以帮忙吗?

推荐答案

我是包的作者.

行百分比稍微复杂一些,因为在数据透视表主体的给定 % 单元格中,您需要该类别(快速/普通)的列车数量和所有类别的数量.积压工作中有一些增强功能可以帮助解决这个问题.但是,与此同时,以下内容将起作用(代码后的解释):

The row percentage is slightly more complex since in a given % cell in the body of the pivot table, you need both the number of trains of that category (Express/Ordinary) and the number of all categories. There are a couple of enhancements on the backlog that will help with this. But, in the meantime, the following will work (explanation after the code):

getPercentageOfAllCategories <- function(pivotCalculator, netFilters, format, baseValues, cell) {
  trains <- pivotCalculator$getDataFrame("bhmtrains")
  netFilters$setFilterValues(variableName="TrainCategory", type="ALL", values=NULL, action="replace") 
  filteredTrains <- pivotCalculator$getFilteredDataFrame(trains, netFilters)
  totalTrainsAllCategories <- nrow(filteredTrains)
  percentageOfAllCategories <- baseValues$N / totalTrainsAllCategories * 100
  value <- list()
  value$rawValue <- percentageOfAllCategories
  value$formattedValue <- pivotCalculator$formatValue(percentageOfAllCategories, format=format)
  return(value)
}

library(pivottabler)
pt <- PivotTable$new()
pt$addData(bhmtrains) 
pt$addColumnDataGroups("TrainCategory")
pt$addRowDataGroups("TOC")
pt$defineCalculation(calculationName="N", summariseExpression="n()")
pt$defineCalculation(calculationName="Percentage", caption="%", format="%.1f %%", basedOn="N",
  type="function", calculationFunction=getPercentageOfAllCategories)
pt$renderPivot()

结果:

这是通过定义一个自定义计算函数来实现的,该函数对数据透视表中的每个 % 单元格调用一次.自定义计算函数获取给定单元格的过滤器(即哪个 TOC 和 TrainCategory),然后覆盖类别过滤器以清除 TrainCategory 条件.然后将过滤器应用于数据框,计算结果行数并计算百分比.计算小插图中有更多关于自定义计算函数的信息.

This works by defining a custom calculation function that is invoked once per % cell in the pivot table. The custom calculation function gets the filters for a given cell (i.e. which TOC and TrainCategory), then overrides the category filter to clear the TrainCategory criteria. The filters are then applied to the data frame, the resulting number of rows counted and the percentage calculated. There is a little bit more information on custom calculation functions in the calculations vignette.

这篇关于数据透视表计算小 Ns 和行百分比 (R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆