谷歌如何保持在最近的X分钟,进行总搜索? [英] How google maintains total searches conducted in last X minutes?

查看:145
本文介绍了谷歌如何保持在最近的X分钟,进行总搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您滚动到本网站的末尾:
http://www.google.com/insidesearch/howsearchworks/thestory/index.html

If you scroll to the end of this site:
http://www.google.com/insidesearch/howsearchworks/thestory/index.html

您会看到谷歌显示在过去的Y秒进行X查询。

You'll see google displaying X queries performed in last Y seconds.

我的问题是:谷歌有成千上万它们同时接收查询服务器。所有的服务器都将更新相同的整数,我们需要把锁解决这个号码,是这样的:

My question is: google has thousands of servers which are receiving queries simultaneously. All the servers should update the same integer and we need to put lock around this number, something like:

function IncrementQueryCount()
{
  lock(..)
  {
    _queryCount++;
  }
}

我有三个疑问:
1)这是怎么变的服务器之间共享。
2)同步(锁定)时,你有说每秒50K的查询将是太慢了。他们做什么..返回结果,并增加在后台线程计数?
3)该网站显示没有。自时间Y秒的查询。谷歌是如何保持每秒查询记录 必须有像

I have three doubts:
1) How is this variable shared among servers.
2) synchronization (lock) when you have say 50K queries per second would be too slow. What do they do.. return the result and increment the count in background thread?
3) The website displays no. of queries since time Y seconds. How is google maintaining record of queries per second There must be some data structure like

class QueryCountEverySecond
{
  int queries; //queries performed at 'time'
  long time; //time in second from epoch
}

谷歌是如何做到的呢?

How does google do it?

请提及任何数据结构,算法或只是并发或设计方法(关键字不仅会的工作,我可以探索更多)设计这样一个系统时是相关的。

Please mention any data structure, algorithm or just concurrency or design approach (keywords only would work that I can explore more) that are relevant when designing such a system.

推荐答案

我会使用在所有服务器之间共享一个数据库表实现这一点。显然,我们不能进行一次查询每次递增一个计数器。这将是一个瓶颈。相反,我有每个服务器(其中有大概数百个)插入每秒一行到表:

I'd implement this using a database table that is shared among all servers. Clearly, we cannot increment a single counter each time a query is performed. That would be a bottleneck. Instead, I'd have each server (of which there are probably hundreds) insert one row per second into a table:

CREATE TABLE T (DateTime datetime, ServerID int, QueryCount int)

和当在最后N秒查询的数量要计算,则聚合过的行

And when the number of queries in the last N seconds is to be calculated, you aggregate over the rows:

SELECT SUM(QueryCount) FROM T WHERE DateTime >= nowMinusNSeconds

这是一个简单的分布式和可扩展的计数器,使用任何旧的关系型数据库实现的。

That's a simple distributed and scalable counter, implemented using any old RDBMS.

这篇关于谷歌如何保持在最近的X分钟,进行总搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆