在dc.js中用子列绘制汇总数据 [英] Plotting aggregated data with sub-columns in dc.js

查看:91
本文介绍了在dc.js中用子列绘制汇总数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下形式的数据:

data = [..., {id:X,..., turnover:[[2015,2017,2018],[2000000,3000000,2800000]]}, ...];

我的目标是在x轴上绘制年份,而在y轴上绘制当前通过交叉过滤器选择的所有公司的平均营业额.

My goal is to plot the year in the x-axis, against the average turnover for all companies currently selected via crossfilter in the y-axis.

每个公司的记录年份不一致,但是应该始终为三年.

The years recorded per company are inconsistent, but there should always be three years.

如果有帮助,我可以将数据重新组织为以下形式:

If it would help, I can reorganise the data to be in the form:

data = [..., {id:X,..., turnover:{2015:2000000, 2017:3000000, 2018:2800000}}, ...];

如果我能够进一步重组数据,使其看起来像这样:

Had I been able to reorganise the data further to look like:

[...{id:X, ..., year:2015, turnover:2000000},{id:X,...,year:2017,turnover:3000000},{id:X,...,year:2018,turnover:2800000}]; 

然后此问题将提供解决方案.

但是将公司分为几行对我所做的其他事情都没有意义.

But splitting the companies into separate rows doesn't make sense with everything else I'm doing.

推荐答案

除非我没有记错,否则您拥有我所说的,又称带有数组键的尺寸.

Unless I'm mistaken, you have what I call a "tag dimension", aka a dimension with array keys.

您希望每行包含的每一年都记录一次,但只希望它影响此维度.您不想在其他维度上多次观察该行,这就是为什么您不想展平的原因.

You want each row to be recorded once for each year it contains, but you only want it to affect this dimension. You don't want to observe the row multiple times in the other dimensions, which is why you don't want to flatten.

使用原始数据格式,维度定义将类似于:

With your original data format, your dimension definition would look something like:

var yearsDimension = cf.dimension(d => d.turnover[0], true);

标签维度的键函数应该返回一个数组,这里是几年.

The key function for a tag dimension should return an array, here of years.

随着交叉过滤器的使用,此功能仍是相当新的功能,还有几个次要今年发现了错误.这些错误应该很容易避免.该功能已得到广泛使用,但未发现重大错误.

This feature is still fairly new, as crossfilter goes, and a couple of minor bugs were found this year. These bugs should be easy to avoid. The feature has gotten a lot of use and no major bugs have been found.

请始终注意标签尺寸,因为任何聚合加起来都超过100%-在您的情况下为300%.但是,如果您正在对公司的平均值进行一年的计算,那应该不会有问题.

Always beware with tag dimensions, since any aggregations will add up to more than 100% - in your case 300%. But if you are doing averages across companies for a year, this should not be a problem.

您的问题的独特之处在于,不仅每行有多个键,而且还具有与这些键关联的多个值.

What's unique about your problem is that you not only have multiple keys per row, you also have multiple values associated with those keys.

尽管交叉过滤器标签尺寸功能很方便,但它使您无法知道在缩小时正在查看的哪个标签.此外,最强大,最通用的组减少方法 group.reduce()不会告诉您要减少的密钥..

Although the crossfilter tag dimension feature is handy, it gives you no way to know which tag you are looking at when you reduce. Further, the most powerful and general group reduction method, group.reduce(), doesn't tell you which key you are reducing..

但是有一种更强大的方法可以一次减少整个交叉过滤器:

But there is one even more powerful way to reduce across the entire crossfilter at once: dimension.groupAll()

groupAll 对象的行为类似于一个组,只是它被喂入所有行,并且仅返回一个bin.如果使用dimension.groupAll(),则会得到一个groupAll对象,该对象观察除该维上的所有过滤器之外的所有过滤器.如果您想要观察到的groupAll,也可以使用 crossfilter.groupAll 所有过滤器.

A groupAll object behaves like a group, except that it is fed all of the rows, and it returns only one bin. If you use dimension.groupAll() you get a groupAll object that observes all filters except those on that dimension. You can also use crossfilter.groupAll if you want a groupAll that observes all filters.

这里是groupAll.reduce()的归约函数的一种解决方案(为简便起见,使用ES6语法),该函数将所有行归约为year => {count,total}的对象.

Here is a solution (using ES6 syntax for brevity) of reduction functions for groupAll.reduce() that reduces all of the rows into an object of year => {count, total}.

function avg_paired_tag_reduction(idTag, valTag) {
  return {
    add(p, v) {
      v[idTag].forEach((id, i) => {
        p[id] = p[id] || {count: 0, total: 0};
        ++p[id].count;
        p[id].total += v[valTag][i];
      });
      return p;
    },
    remove(p, v) {
      v[idTag].forEach((id, i) => {
        console.assert(p[id]);
        --p[id].count;
        p[id].total -= v[valTag][i];
      })
      return p;
    },
    init() {
      return {};
    }
  };
}

它将被馈送到每一行,并且它将遍历该行中的键和值,从而为每个键产生一个计数和总计.假定键数组和值数组的长度相同.

It will be fed every row and it will loop over the keys and values in the row, producing a count and total for every key. It assumes that the length of the key array and the value array are the same.

然后,我们可以使用假组" 将对象按需转换为dc.js图表​​期望的{key,value}对数组:

Then we can use a "fake group" to turn the object on demand into the array of {key,value} pairs that dc.js charts expect:

function groupall_map_to_group(groupAll) {
  return {
    all() {
      return Object.entries(groupAll.value())
        .map(([key, value]) => ({key,value}));
    }
  };
}

像这样使用这些功能:

const red = avg_paired_tag_reduction('id', 'val');
const avgPairedTagGroup = turnoverYearsDim.groupAll().reduce(
  red.add, red.remove, red.init
);
console.log(groupall_map_to_group(avgPairedTagGroup).all());

尽管可以计算移动平均值,但如上所述,计算一个计数和总计,然后告诉图表如何在值访问器中计算平均值,效率更高:

Although it's possible to compute a running average, it's more efficient to instead calculate a count and total, as above, and then tell the chart how to compute the average in the value accessor:

chart.dimension(turnoverYearsDim)
  .group(groupall_map_to_group(avgPairedTagGroup))
  .valueAccessor(kv => kv.value.total / kv.value.count)

演示小提琴.

这篇关于在dc.js中用子列绘制汇总数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆