为什么ListBlobsSegmentedAsync仅在第二页上返回结果? [英] Why is ListBlobsSegmentedAsync only returning results on second page?

查看:124
本文介绍了为什么ListBlobsSegmentedAsync仅在第二页上返回结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取最多5000个blob的页面,且不带前缀.该容器中大约有26,000个斑点.我在第一页上始终没有得到任何结果,但是我注意到返回的BlobContinuationToken不为null,因此我可以再次翻页并在第二页上获得结果. 为什么第一页上没有任何结果,但是第二页上却有结果?

I'm trying to grab one page of up to 5000 blobs, with no prefix. The container in question has roughly 26,000 blobs in it. I consistently get no results on my first page, but I noticed the BlobContinuationToken that's returned isn't null, so I can page again and get results on the second page. Why aren't there any results on the first page, but there are on the second?

我希望能够做到这一点,并且仅抓取一页:

I'd like to be able to do this, and grab only one page:

var response = await container.ListBlobsSegmentedAsync(null).ConfigureAwait(false);

但这不会返回任何结果,因此,我不得不再次调用它,并传入了continuationToken,这时我确实得到了结果.

But this returns no results, so instead, I have to call it again, passing in the continuationToken, at which point I do get results.

  • 这只有在容器变小(以前有超过100,000个blob)时才开始发生
  • 我正在对该容器进行频繁删除,但找不到任何能影响可用性的信息
  • 我尝试将true传递给useFlatBlobListing,它并没有改变任何东西,但是我不太了解该选项(据我所知,我的容器中的物品是扁平的)
  • 我以前用过ListBlobsSegmentedAsync,但从未注意到此问题(但是容器较大)
  • 我正在使用4.3.0的Storage SDK版本,该版本已过时.我尝试更新,但无法解决问题,所以我回去了
  • 我尝试传递一个空的continuationToken以及new BlobContinuationToken().我不确定是否更合适
  • 我可以通过Visual Studio中的Cloud Explorer验证容器中是否仍然有26,000个Blob,但是结果首页上的代码中没有.我想知道,Cloud Explorer有何不同之处?
  • This only started happening when the container got smaller (it used to have over 100,000 blobs)
  • I'm doing frequent deletes on this container, but I couldn't find anything that said this should impact availability
  • I tried passing in true for useFlatBlobListing and it didn't change anything, but I don't really understand the option (as far as I'm aware, my container's contents are flat)
  • I've used ListBlobsSegmentedAsync before and never noticed this problem (but the containers were larger)
  • I'm using version 4.3.0 of the Storage SDK, which is outdated. I tried updating but it didn't fix the problem, so I went back
  • I've tried passing in a null continuationToken as well as just new BlobContinuationToken(). I'm not sure if one is preferable
  • I can verify that there are still 26,000 blobs in the container via the Cloud Explorer in Visual Studio, but not in code on the first page of results. What's the Cloud Explorer doing differently, I wonder?

在一个较大的容器上,一段时间后它开始进行两次以上的页面获取以获取结果.每次获取页面(包括空白页面)大约需要5秒钟,直到最终返回结果.我看到在峰值时最多需要进行12页的访存,总共要花费60秒以上的时间才能在包含300,000个blob的容器上返回结果.这是在对容器进行大规模删除之后不久.

On a larger container, after awhile it started taking more than two page fetches in order to get results. Each page fetch (including the empty ones) took right around 5 seconds, until it finally returned results. I saw it take up to 12 page fetches at its peak, taking over 60 seconds total to return results on a container that had over 300,000 blobs. This is shortly after doing massive deletes on the container.

推荐答案

有时候,您偶尔会得到空白页或少于最大结果的页以及连续标记,这一点也不意外.如果返回的继续令牌将您带到下一页,为什么会出现问题?如果您不想处理延续令牌,则ListBlobs(不是分段版本)将提供一个迭代器,该迭代器会懒惰地获取更多的blob,并为您遵循延续令牌.

It's not at all unexpected that you can occasionally get empty pages or pages with less than the max results along with a continuation token. Why is this a problem if the continuation token returned takes you to your next page? If you don't want to deal with continuation tokens, ListBlobs (not the segmented version) will give an iterator that will lazily get more blobs and follow the continuation tokens for you.

至于根本原因,可能有很多原因.我的猜测实际上是您情况下的频繁删除,但这是一个猜测.返回的结果少于最大结果数,并且由于多个原因而发生连续,但是我怀疑有两个原因:1.我们遇到了服务器端超时,因此我们返回了到目前为止的结果2.出现了分区边缘当Blob列表很大并且可能跨越多台计算机时,这种情况会更频繁地发生.如果您经常删除blob并且有很多东西,那么可能实际上需要一些时间来对其进行垃圾收集,因此我们将花费所有时间来扫描不返回的内容.

As for the root cause, there's a lot of reasons this could happen. My guess is actually the frequent deletes in your case, but that's a guess. Returning less than the number of max results and a continuation happens for multiple reasons, but a couple I suspect here are: 1. We hit the server-side timeout, so we return what we have thus far 2. Hit edge of a partition which happens more frequently when the blob list is large and may span several machines. If you're frequently deleting blobs and have a lot it may take some time to actually garbage collect those so we'll spend all our time scanning through stuff we don't return.

这篇关于为什么ListBlobsSegmentedAsync仅在第二页上返回结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆