OLAP可以在BigTable中完成吗? [英] Can OLAP be done in BigTable?

查看:138
本文介绍了OLAP可以在BigTable中完成吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去,我曾经使用在MySQL上运行的OLAP立方体来构建WebAnalytics。
现在,我使用OLAP多维数据集只是一个大表(好吧,它存储比这更聪明),其中每一行基本上是一个测量或一组总体测量。每个度量都有一堆维度(即哪个pagename,useragent,ip等)和一堆值(即多少个综合浏览量,多少访问者等)。

你在这样的表上运行的查询通常是以下形式(meta-SQL):

  SELECT SUM (点击),SUM(字节),
FROM MyCube
WHERE date ='20090914'和pagename ='Homepage'和浏览器!='googlebot'
GROUP BY小时

因此,您可以使用提到的过滤器获得选定日期每小时的总计。
一个障碍是,这些立方体通常意味着全表扫描(各种原因),这意味着对尺寸的实际限制(以MiB为单位),您可以制作这些东西。


$ b $



在BigTable上运行上述查询作为mapreduce看起来很容易:
只需将'小时'作为关键字,在地图中进行过滤,然后通过合计值来减少。



您可以像上面显示的那样运行查询(或至少(即通过用户界面,用户尽快得到他们的答案),而不是批量模式?



如果在BigTable类型的系统上实时不;什么是在BigTable / Hadoop / HBase / Hive等领域采取类似行动的适当技术?

它甚至已经完成(种)。



LastFm的汇总/汇总引擎: http://github.com/zohmg/zohmg



谷歌搜索出现了一个谷歌代码项目mroll,但它没有除联系信息外没有任何内容(没有代码,没有任何内容)。不过,可能想要与那个人联系,看看有什么问题。 http://code.google.com/p/mroll/


In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).

The queries that you run on a table like this are usually of the form (meta-SQL):

SELECT SUM(hits), SUM(bytes),
FROM MyCube
WHERE date='20090914' and pagename='Homepage' and browser!='googlebot'
GROUP BY hour

So you get the totals for each hour of the selected day with the mentioned filters. One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.

I'm currently learning the ins and outs of Hadoop and the likes.

Running the above query as a mapreduce on a BigTable looks easy enough: Simply make 'hour' the key, filter in the map and reduce by summing the values.

Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?

If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?

解决方案

It's even kind of been done (kind of).

LastFm's aggregation/summary engine: http://github.com/zohmg/zohmg

A google search turned up a google code project "mroll" but it doesn't have anything except contact info (no code, nothing). Still, might want to reach out to that guy and see what's up. http://code.google.com/p/mroll/

这篇关于OLAP可以在BigTable中完成吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆