在Azure的表中查询最新资料的最快方法? [英] Fastest way of querying for latest items in a Azure table?

查看:141
本文介绍了在Azure的表中查询最新资料的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Azure的表,其中客户发布消息,可能会有数百万条消息的一个表。我想找到让最后10分钟内发布的消息(这是多久我刷新网页)的最快的方法。由于只有分区键被索引我打了使用日期和放大器的想法;时代的信息被张贴作为分区键,例如一个字符串作为一个ISO8601日期格式,如2009-06-15T13:45:30.0900000

I have a Azure table where customers post messages, there may be millions of messages in a single table. I want to find the fastest way of getting the messages posted within the last 10 minutes (which is how often I refresh the web page). Since only the partition key is indexed I have played with the idea of using the date & time the message was posted as a partition key, for example a string as a ISO8601 date format like "2009-06-15T13:45:30.0900000"

例伪code:

var message = "Hello word!";
var messagePartitionKey = DateTime.Now.ToString("o");
var messageEntity = new MessageEntity(messagePartitionKey, message);
dataSource.Insert(messageEntity);

,然后查询最后10分钟这样的内登载的消息(未经测试的伪$ C $再次C):

, and then query for the messages posted within the last 10 minutes like this (untested pseudo code again):

// Get the date and time 10 minutes ago
var tenMinutesAgo = DateTime.Now.Subtract(new TimeSpan(0, 10, 0)).ToString("o");

// Query for the latest messages
var latestMessages = (from t in
   context.Messages
   where t.PartitionKey.CompareTo(tenMinutesAgo) <= 0
   select t
   )

不过,这将通过索引取呢?还是会导致全表扫描?任何人有这样做的更好的主意吗?我知道有对每个表项的时间戳,但它没有索引,以便它会为我的目的太慢了。

But will this be taken well by the index? Or will it cause a full table scan? Anyone have a better idea of doing this? I know there is a timestamp on each table item, but it is not indexed so it will be too slow for my purpose.

推荐答案

我觉得你有正确的基本理念。你设计的查询应约同样有效,你可以期待。但也有一些改进,我可以提供。

I think you've got the right basic idea. The query you've designed should be about as efficient as you could hope for. But there are some improvements I could offer.

而不是使用 DateTime.Now ,使用 Date.UtcNow 。从我了解的实例都设置仍要使用UTC时间作为他们的基地,但是这只是确保你与苹果比较苹果和可转换可靠的时候回你想显示他们当什么时区。

Rather than using DateTime.Now, use Date.UtcNow. From what I understand instances are set to use Utc time as their base anyway, but this just makes sure you're comparing apples with apples and you can reliable convert the time back into whatever timezone you want when displaying them.

而不是存储时间的ToString(O)打开一次进入蜱和商店,你会用更少的格式问题结束(有时你会在年底得到时区规范,有时没有)。此外,如果你总是希望看到这些消息,从最新的排序,以最古老的可以从例如蜱的最大数量减去蜱的数量。

Rather than storing the time as .ToString("o") turn the time into ticks and store that, you'll end up with less formatting problems (sometimes you'll get the timezone specification at the end, sometimes not). Also if you always want to see these messages sorted from most recent to oldest you can subtract the number of ticks from the max number of ticks e.g.

var messagePartitionKey = (DateTime.MaxValue.Ticks - _contactDate.Ticks).ToString("d19");

这也将是指定的行密钥是个好主意。虽然这是极不可能的,两个消息将正好在同一时间发布,这不是不可能的。如果你没有一个明显的行键,然后只需将其设置为一个GUID。

It would also be a good idea to specify a row key. While it is highly unlikely that two messages will be posted with exactly the same time, it's not impossible. If you don't have an obvious row key, then just set it to be a Guid.

这篇关于在Azure的表中查询最新资料的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆