实时统计:MySQL(/Drizzle)还是MongoDB? [英] Real-time statistics: MySQL(/Drizzle) or MongoDB?

查看:119
本文介绍了实时统计:MySQL(/Drizzle)还是MongoDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在开发一个项目,该项目将实时统计某些操作(例如点击次数). 每次点击时,我们都会记录日期,年龄和性别(来自Facebook),位置等信息.

We are working on a project that will feature real-time statistics of some actions (e.g. clicks). On every click, we will log information like date, age and gender (these come from Facebook), location, etc.

我们正在讨论存储这些信息并将其用于实时统计的最佳位置.我们将显示汇总统计信息:例如,点击次数,男性/女性点击次数,点击次数除以年龄段(例如18-24、24-30 ...).

We are discussing about the best place to store these information and use them for real-time stats. We will display aggregate statistics: for example, number of clicks, number of clicks made by male/female, number of clicks divided by age groups (e.g. 18-24, 24-30...).

由于在该站点上我们到处都在使用MongoDB,所以我的同事认为我们也应该在其中存储统计信息. 但是,我更喜欢基于SQL的数据库来执行此任务,例如MySQL(或也许是Drizzle),因为我认为在进行数据聚合之类的操作时,SQL会更好.尽管解析SQL会有开销,但我认为MySQL/Drizzle实际上可能比No-SQL数据库要快.使用INSERT DELAYED查询时,插入也不会太慢.

Since on the site we are using MongoDB everywhere, my colleague thought we should store statistics inside it as well. I, however, would prefer a SQL-based database for this task, like MySQL (or maybe Drizzle), because I believe SQL is better when doing operations like data aggregation. Although there's the overhead of parsing the SQL, I think MySQL/Drizzle may actually be faster than No-SQL databases here. And inserts are not slow too, when using INSERT DELAYED queries.

请注意,我们不需要执行JOINS或从多个表/集合中收集数据.因此,我们不在乎数据库是否不同. 但是,我们确实关心可伸缩性和可靠性.我们正在构建将(希望)变得非常大的东西,并且在设计每行代码时都考虑了可伸缩性.

Please note that we do not need to perform JOINS or collect data from multiple tables/collections. Thus, we don't care if the database is different. However, we do care about scalability and reliability. We are building something that will (hopefully) become very big, and we've designed every single line of code with scalability in mind.

您对此有何看法? 有什么理由比MySQL/Drizzle更喜欢MongoDB吗?还是无动于衷? 如果您是我们,您将使用哪一个?

What do you think about this? Is there any reason to prefer MongoDB over MySQL/Drizzle for this? Or is it indifferent? Which one would you use, if you were us?

谢谢你, 亚历山德罗

推荐答案

因此BuddyMedia正在使用其中的一些功能. Gilt Groupe使用 Hummingbird (node.js + MongoDB)做得很酷.

So BuddyMedia is using some of this. The Gilt Groupe has done something pretty cool with Hummingbird (node.js + MongoDB).

我曾在社交媒体领域为大型在线广告客户服务,我可以证明实时报告确实很痛苦.每天尝试累积" 5亿次展示已经是一个挑战,但是尝试做到实时有效,但存在一些明显的局限性. (就像实际上延迟了5分钟:)

Having worked for a large online advertiser in the Social Media space, I can attest that real-time reporting is really a pain. Trying to "roll-up" 500M impressions a day is already a challenge, but trying to do it real time worked, but it carried some significant limitations. (like it was actually delayed by 5-minutes :)

坦率地说,这类问题是我开始使用MongoDB的原因之一.我不是唯一的一个.人们正在使用MongoDB进行各种实时分析:服务器监视

Frankly, this type of problem is one of the reasons I started using MongoDB. And I'm not the only one. People are using MongoDB for all kinds of real-time analytics: server monitoring, centralized logging, as well as dashboard reporting.

进行此类报告时,真正的关键是要了解MongoDB的数据结构完全不同,您将避免聚合"查询,因此查询和输出图表将有所不同.客户端上还有一些额外的编码工作.

The real key when doing this type of reporting is to understand that the data structure is completely different with MongoDB, you're going to avoid "aggregation" queries, so the queries and the output charts are going to be different. There's some extra coding work on the client side.

这是可能为您指出使用MongoDB执行此操作的正确方向的关键.看一下以下数据结构:

Here's the key that may point you in the right direction for doing this with MongoDB. Take a look at the following data structure:

{
  date: "20110430",
  gender: "M",
  age: 1, // 1 is probably a bucket
  impression_hour: [ 100, 50, ...], // 24 of these
  impression_minute: [ 2, 5, 19, 8, ... ], // 1440 of these
  clicks_hour: [ 10, 2, ... ],
  ...
}

这里显然有一些调整,适当的索引,也许将数据+性别+年龄猛击到_id中.但这是MongoDB点击分析的基本结构.更新印象并单击{ $inc : { clicks_hour.0 : 1 } }确实很容易.您可以自动更新整个文档.进行报告实际上是很自然的.您已经有了一个包含小时或分钟级别数据点的数组.

There are obviously some tweaks here, appropriate indexes, maybe mushing data+gender+age into an _id. But that's kind of the basic structure of click analytics with MongoDB. It's really easy to update impression and clicks { $inc : { clicks_hour.0 : 1 } }. You get to update the whole document atomically. And it's actually pretty natural to report on. You already have your an array containing your hourly or minute-level data points.

希望这会为您指明正确的方向.

Hopefully that's points you in the right direction.

这篇关于实时统计:MySQL(/Drizzle)还是MongoDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆