OLAP 可以在 BigTable 中做吗? [英] Can OLAP be done in BigTable?

查看:42
本文介绍了OLAP 可以在 BigTable 中做吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

过去我曾经使用在 MySQL 上运行的 OLAP 多维数据集构建 WebAnalytics.现在,我使用的 OLAP 多维数据集只是一个大表(好吧,它的存储比那更智能),其中每一行基本上是一个测量值或一组测量值的聚合.每个度量都有一堆维度(即哪个页面名称、用户代理、ip 等)和一堆值(即有多少浏览量、多少访问者等).

In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).

您在这样的表上运行的查询通常采用以下形式(元 SQL):

The queries that you run on a table like this are usually of the form (meta-SQL):

SELECT SUM(hits), SUM(bytes),
FROM MyCube
WHERE date='20090914' and pagename='Homepage' and browser!='googlebot'
GROUP BY hour

因此,您可以使用上述过滤器获得所选日期的每个小时的总数.一个障碍是这些立方体通常意味着全表扫描(各种原因),这意味着您可以制作这些东西的大小(以 MiB 为单位)受到实际限制.

So you get the totals for each hour of the selected day with the mentioned filters. One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.

我目前正在学习 Hadoop 等的来龙去脉.

I'm currently learning the ins and outs of Hadoop and the likes.

在 BigTable 上将上述查询作为 mapreduce 运行看起来很容易:只需将小时"作为键,在地图中过滤并通过对值求和来减少.

Running the above query as a mapreduce on a BigTable looks easy enough: Simply make 'hour' the key, filter in the map and reduce by summing the values.

您能否在 BigTable 类型的系统上实时"(即通过用户界面,用户尽快得到答案)而不是批处理模式运行我上面显示的查询(或至少具有相同的输出)?

Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?

如果没有;在 BigTable/Hadoop/HBase/Hive 等领域,什么是合适的技术来做这样的事情?

If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?

推荐答案

它甚至已经完成了(有点).

It's even kind of been done (kind of).

LastFm 的聚合/汇总引擎:http://github.com/zohmg/zohmg

LastFm's aggregation/summary engine: http://github.com/zohmg/zohmg

谷歌搜索出现了一个谷歌代码项目mroll",但它除了联系信息之外什么都没有(没有代码,什么都没有).不过,可能想联系那个人,看看发生了什么.http://code.google.com/p/mroll/

A google search turned up a google code project "mroll" but it doesn't have anything except contact info (no code, nothing). Still, might want to reach out to that guy and see what's up. http://code.google.com/p/mroll/

这篇关于OLAP 可以在 BigTable 中做吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆