数据库分析的架构 [英] Architecture for database analytics

查看:182
本文介绍了数据库分析的架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个架构,我们为他们的网站(互联网商家)提供每个客户的商业智能服务。现在,我需要在内部分析这些数据(用于算法改进,性能跟踪等),并且这些可能相当重:我们有多达数百万行/客户/天,我可能想知道有多少查询我们已经在上个月,每周比较等等,这是数十亿条目如果不是更多的顺序。



目前的做法是相当标准:每日脚本扫描数据库,并生成大型CSV文件。我不喜欢这个解决方案有几个原因:




  • 这些类型的脚本典型,它们落入一次写再次类别

  • 必须跟踪实时内容(我们有单独的工具集查询最近几小时ATM)。

  • 这是缓慢且非敏捷的



虽然我在处理科学用途的巨大数据集时有一些经验,完全的初学者到传统的RDBM去。看起来使用面向列的数据库进行分析可能是一个解决方案(分析不需要我们在应用程序数据库中的大部分数据),但我想知道什么其他选项可用于此类问题。

解决方案

您将需要google 示例数据库具有指导意义,因为它们提供OLTP和OLAP模式以及代表性数据。


We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more.

The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons:

  • as typical with those kinds of scripts, they fall into the write-once and never-touched-again category
  • tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM).
  • this is slow and non-"agile"

Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

解决方案

You will want to google Star Schema. The basic idea is to model a special data warehouse / OLAP instance of your existing OLTP system in a way that is optimized to provided the type of aggregations you describe. This instance will be comprised of facts and dimensions.

In the example below, sales 'facts' are modeled to provide analytics based on customer, store, product, time and other 'dimensions'.

You will find Microsoft's Adventure Works sample databases instructive, in that they provide both the OLTP and OLAP schemas along with representative data.

这篇关于数据库分析的架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆