NoSQL 数据库如何处理聚合函数(AVG、SUM 等) [英] How NoSQL databases perform on aggregate functions (AVG, SUM, etc)

查看:75
本文介绍了NoSQL 数据库如何处理聚合函数(AVG、SUM 等)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要定期处理相当大的数据集(30-40GB).它有很多按时间(和更多信息)排序的值,但我们基本上需要按月执行一些数学运算.

We need to work periodically with a pretty big dataset (30-40GB). It has a lot of values ordered by time (and more information) but we basically need to perform some mathematical operations by month.

我们的第一种方法是使用 MySQL 数据库来支持数据,因为我们对引擎和关系方法有合理的经验.但是,这个过程需要很长时间,我们想知道 NoSQL 方法是否可以做得更好.

Our first approach was to use a MySQL database to back the data, as we have a reasonable experience with the engine and with the relational approach. However, the process takes too long and we were wondering if a NoSQL approach can do it better.

基本上我们需要表达的数据是:

Basically the data that we need to express is:

Value: { NumericalValue, Year, Month }
Entity: List of 'Value'

我们处理这个列表三次,做简单的数学运算,当我说处理"时,我的意思是遍历数据集并执行微积分.当一切都结束时,我们有相同的结构(但具有不同的数据):

We process this list three times, doing simple mathematical operations, and when I say 'process' I mean iterate through the dataset and perform the calculus. When everything is over, we have the same structure (but with different data):

Value: { NumericalValue, Year, Month }
Entity: List of 'Value'

现在我们发现了最大的问题,因为我们需要计算一些平均值,这需要很多时间.当我们多次重复这个过程时,我认为最耗时的任务是:

It's now when we found the biggest problems as we need to calculate some AVERAGES and it takes a lot. As we repeat this process some times, I think that the most consuming tasks are:

1) 将数据集导出到 MySQL.这意味着大量来自文本文件的插入.

1) Exporting the dataset to MySQL. Which means a lot of inserts from text files.

当数据被转换时:

2) 使用 LIMIT 计算一些包含聚合函数 (AVG,SUM) 的查询.3) 用整个数据集计算一些包含聚合函数的查询.

2) Compute some queries that contains aggregate functions (AVG,SUM) with LIMIT. 3) Compute some queries that contains aggreate functions with the whole dataset.

通常,即使添加了一些索引,我们也会觉得事情花费的时间太长(某些查询需要 20 分钟).任何提示或解决策略将不胜感激.我觉得 NoSQL 数据库不是专门为此而设计的,但也许一些经验会有所帮助:)

Usually, even with some indexes added, we feel that things take too long (20 mins some queries). Any tip or resolution strategy would be very appreciated. I feel that NoSQL databases aren't designed specifically for this, but maybe some experiences could help :).

感谢您的时间,

推荐答案

您的任务非常适合列式数据库.面向列的 NoSQL(例如 Cassandra)数据库将数据表存储为数据列的部分而不是数据行.这大大提高了聚合的速度.这与依赖硬盘进行存储的系统有关.如果不是这种情况(例如内存数据库),还有更多的选择来挤压性能.

Your task fits very well into Columnar databases. Column-oriented NoSQL(e.g. Cassandra) databases store data tables as sections of columns of data rather than as rows of data. This improves the speed of aggregations drastically. This have to do with systems that rely on hard disks for storage. If this is not the case(in-memory databases for examples) there are much more options for squeezing out performance.

这篇关于NoSQL 数据库如何处理聚合函数(AVG、SUM 等)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆