每天数百万新行的数据库架构 [英] Database architecture for millions of new rows per day

查看：175 发布时间：2017/3/14 2:13:13 mysql database scalability analytics data-warehouse

本文介绍了每天数百万新行的数据库架构的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要为大量网站实施自定义开发的网站分析服务。这里的主要实体是：

I need to implement a custom-developed web analytics service for large number of websites. The key entities here are:

网站

访客

每个唯一身份访问者在数据库中都有一行，包括着陆页，时间，操作系统，浏览器，引荐来源网址，IP等信息。

Each unique visitor will have have a single row in the database with information like landing page, time of day, OS, Browser, referrer, IP, etc.

我需要对此数据库执行聚合查询，例如COUNT所有具有Windows作为操作系统并来自Bing.com的访问者

I will need to do aggregated queries on this database such as 'COUNT all visitors who have Windows as OS and came from Bing.com'

我有数百个网站要跟踪，这些网站的访问者数量从每天几百天到每天几百万。我总共希望这个数据库每天增长大约100万行。

I have hundreds of websites to track and the number of visitors for those websites range from a few hundred a day to few million a day. In total, I expect this database to grow by about a million rows per day.

我的问题是：

1）MySQL是一个很好的数据库吗？

1) Is MySQL a good database for this purpose?

2）什么是好的架构？我想为每个网站创建一个新的表。或者可以从一个表开始，然后生成一个新表（每天），如果现有表中的行数超过100万（是我的假设正确）。我唯一担心的是，如果表增长太大，SQL查询可能会显着减慢。那么，我应该存储每个表的最大行数是多少？此外，MySQL可以处理的表的数量是有限制的。

2) What could be a good architecture? I am thinking of creating a new table for each website. Or perhaps start with a single table and then spawn a new table (daily) if number of rows in an existing table exceed 1 million (is my assumption correct). My only worry is that if a table grows too big, the SQL queries can get dramatically slow. So, what is the maximum number of rows I should store per table? Moreover, is there a limit on number of tables that MySQL can handle.

3）是否建议对数百万行执行聚合查询？我准备等待几秒钟以获得这些查询的结果。这是一个好的做法还是有其他方法来执行聚合查询？

3) Is it advisable to do aggregate queries over millions of rows? I'm ready to wait for a couple of seconds to get results for such queries. Is it a good practice or is there any other way to do aggregate queries?

简而言之，我正在设计一个大规模的数据仓库类型的设置将写入很重。

推荐答案

如果你说的是大量的数据，然后查看 MySQL分区。对于这些表，按数据/时间划分的分区肯定有助于性能。有一篇关于分区此处的体面文章。

If you're talking larger volumes of data, then look at MySQL partitioning. For these tables, a partition by data/time would certainly help performance. There's a decent article about partitioning here.

查看创建两个单独的数据库：一个用于具有最小索引的写入的所有原始数据;第二用于使用所述聚合值来报告;

Look at creating two separate databases: one for all raw data for the writes with minimal indexing; a second for reporting using the aggregated values; with either a batch process to update the reporting database from the raw data database, or use replication to do that for you.

编辑

如果您想要对汇总报告非常聪明，请创建一组聚合表（今天，周至今，月至日，年）。从原始数据到今天每天或实时聚合;每天从按日到周到日;从周到日到月到日，等等。在执行查询时，加入（UNION）您感兴趣的日期范围的相应表。

If you want to be really clever with your aggregation reports, create a set of aggregation tables ("today", "week to date", "month to date", "by year"). Aggregate from raw data to "today" either daily or in "real time"; aggregate from "by day" to "week to date" on a nightly basis; from "week to date" to "month to date" on a weekly basis, etc. When executing queries, join (UNION) the appropriate tables for the date ranges you're interested in.

EDIT＃2

而不是每个客户端一个表，我们使用每个客户端一个数据库模式。根据客户端的大小，我们可能在单个数据库实例中有多个模式，或者每个客户端有一个专用的数据库实例。我们使用单独的模式来进行原始数据收集，以及用于每个客户端的聚合/报告。我们运行多个数据库服务器，将每个服务器限制为单个数据库实例。对于弹性，数据库在多个服务器上进行复制，并进行负载平衡以提高性能。

Rather than one table per client, we work with one database schema per client. Depending on the size of the client, we might have several schemas in a single database instance, or a dedicated database instance per client. We use separate schemas for raw data collection, and for aggregation/reporting for each client. We run multiple database servers, restricting each server to a single database instance. For resilience, databases are replicated across multiple servers and load balanced for improved performance.

这篇关于每天数百万新行的数据库架构的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

每天数百万新行的数据库架构 [英] Database architecture for millions of new rows per day

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

每天数百万新行的数据库架构 [英] Database architecture for millions of new rows per day

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭