天青 - 查询2亿的实体 [英] Azure - Querying 200 million entities

查看:149
本文介绍了天青 - 查询2亿的实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要查询的2亿的实体商店在Windows Azure中。理想情况下,我想用表服务,而不是SQL Azure中,完成这个任务。

I have a need to query a store of 200 million entities in Windows Azure. Ideally, I would like to use the Table Service, rather than SQL Azure, for this task.

用例是这样的:包含新实体的职位将被传入从面向网络的API。我们必须查询约200实体决定我们是否可以不接受新的实体。

The use case is this: a POST containing a new entity will be incoming from a web-facing API. We must query about 200 million entities to determine whether or not we may accept the new entity.

随着1000个实体限制:这是否适用于这种类型的查询,即我要查询1000的时间和执行我的比较/业务规则,或者我可以查询所有2亿的实体在一杆?我想我会打超时在后一种情况。

With the entity limit of 1,000: does this apply to this type of query, i.e. I have to query 1,000 at a time and perform my comparisons / business rules, or can I query all 200 million entities in one shot? I think I would hit a timeout in the latter case.

想法?

推荐答案

扩展在设拉子的有关表存储注释:表被组织成分区,然后你的实体是由一排键索引。这样,每行可使用非常快的分区键+行密钥的组合被发现。关键是要选择适合您的特定应用的最佳分区键和行键。

Expanding on Shiraz's comment about Table storage: Tables are organized into partitions, and then your entities are indexed by a Row key. So, each row can be found extremely fast using the combination of partition key + row key. The trick is to choose the best possible partition key and row key for your particular application.

对于上面的例子,在那里你被电话号码搜索,可以使TelephoneNumber分区键。你可以很容易地找到与该电话号码(不过,不知道你的应用程序,我不知道你有多少行被期待)所有行。要进一步细化的东西,你要定义一个行键,你可以索引,分区键中。这会给你一个非常快的响应,让你知道一个记录是否存在。

For your example above, where you're searching by telephone number, you can make TelephoneNumber the partition key. You could very easily find all rows related to that telephone number (though, not knowing your application, I don't know just how many rows you'd be expecting). To refine things further, you'd want to define a row key that you can index into, within the partition key. This would give you a very fast response to let you know whether a record exists.

表存储(实际上Azure存储一般 - 表,斑点,队列)有一个著名的SLA。您可以执行高达每秒500交易在一个给定的分区。与上面的例子中,对于行对于一个给定的电话号码查询将等同于一项交易(除非你超过1000行回来了 - 看到所有行,你需要额外的抓取);添加行键来缩小搜索将确实,收率单个事务)。所以,将插入新行。您还可以批量多个行插入,单个分区中,并且将它们保存在一个单独的事务。

Table storage (actually Azure Storage in general - tables, blobs, queues) have a well-known SLA. You can execute up to 500 transactions per second on a given partition. With the example above, the query for rows for a given telephone number would equate to one transaction (unless you exceed 1000 rows returned - to see all rows, you'd need additional fetches); adding a row key to narrow the search would, indeed, yield a single transaction). So would inserting a new row. You can also batch up multiple row inserts, within a single partition, and save them in a single transaction.

有关的Azure表的存储的一个很好的概述,一些好的实验室,检查出的<一个href=\"http://www.microsoft.com/downloads/en/details.aspx?FamilyID=413e88f8-5966-4a83-b309-53b7b77edf78&displaylang=en\">Platform培训工具包。

For a nice overview of Azure Table Storage, with some good labs, check out the Platform Training Kit.

有关数据表中的详细信息,请参阅本<一个href=\"http://blogs.msdn.com/b/windowsazurestorage/archive/2010/07/09/understanding-windows-azure-storage-billing-bandwidth-transactions-and-capacity.aspx\">msdn博客文章。

For more info about transactions within tables, see this msdn blog post.

这篇关于天青 - 查询2亿的实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆