Firestore:如何获取集合中的随机文档 [英] Firestore: How to get random documents in a collection

查看:31
本文介绍了Firestore:如何获取集合中的随机文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的应用程序来说,能够从 firebase 的集合中随机选择多个文档至关重要.

It is crucial for my application to be able to select multiple documents at random from a collection in firebase.

由于 Firebase 中没有内置的本机函数(我知道)来实现这样的查询,我的第一个想法是使用查询游标来选择一个随机的开始和结束索引,前提是我有数字集合中的文档.

Since there is no native function built in to Firebase (that I know of) to achieve a query that does just this, my first thought was to use query cursors to select a random start and end index provided that I have the number of documents in the collection.

这种方法可以工作,但只能以有限的方式工作,因为每次都将按顺序提供每个文档及其相邻文档;但是,如果我能够通过其父集合中的索引选择文档,我可以实现随机文档查询,但问题是我找不到任何描述如何执行此操作或即使您可以执行此操作的文档.

This approach would work but only in a limited fashion since every document would be served up in sequence with its neighboring documents every time; however, if I was able to select a document by its index in its parent collection I could achieve a random document query but the problem is I can't find any documentation that describes how you can do this or even if you can do this.

这是我想要做的,考虑以下 firestore 架构:

Here's what I'd like to be able to do, consider the following firestore schema:

root/
  posts/
     docA
     docB
     docC
     docD

然后在我的客户端(我在 Swift 环境中)我想编写一个可以执行此操作的查询:

Then in my client (I'm in a Swift environment) I'd like to write a query that can do this:

db.collection("posts")[0, 1, 3] // would return: docA, docB, docD

无论如何我可以做一些类似的事情吗?或者,是否有不同的方式可以以类似的方式选择随机文档?

Is there anyway I can do something along the lines of this? Or, is there a different way I can select random documents in a similar fashion?

请帮忙.

推荐答案

使用随机生成的索引和简单查询,您可以从 Cloud Firestore 的集合或集合组中随机选择文档.

Using randomly generated indexes and simple queries, you can randomly select documents from a collection or collection group in Cloud Firestore.

这个答案分为 4 个部分,每个部分都有不同的选项:

This answer is broken into 4 sections with different options in each section:

  1. 如何生成随机索引
  2. 如何查询随机索引
  3. 选择多个随机文档
  4. 重新播种以获得持续的随机性

如何生成随机索引

这个答案的基础是创建一个索引字段,当按升序或降序排序时,会导致所有文档被随机排序.有多种方法可以创建它,所以让我们看看 2,从最容易获得的开始.

How to generate the random indexes

The basis of this answer is creating an indexed field that when ordered ascending or descending, results in all the document being randomly ordered. There are different ways to create this, so let's look at 2, starting with the most readily available.

如果您使用我们客户端库中提供的随机生成的自动 ID,您可以使用相同的系统随机选择一个文档.在这种情况下,随机排序的索引文档ID.

If you are using the randomly generated automatic ids provided in our client libraries, you can use this same system to randomly select a document. In this case, the randomly ordered index is the document id.

稍后在我们的查询部分,您生成的随机值是一个新的自动 ID (iOS, Android, Web) 并且您查询的字段是 __name__ 字段,而 '后面提到的low value'是一个空字符串.这是迄今为止生成随机索引最简单的方法,并且不受语言和平台的影响.

Later in our query section, the random value you generate is a new auto-id (iOS, Android, Web) and the field you query is the __name__ field, and the 'low value' mentioned later is an empty string. This is by far the easiest method to generate the random index and works regardless of the language and platform.

默认情况下,文档名称 (__name__) 仅按升序索引,并且您也无法通过删除和重新创建来重命名现有文档.如果您需要其中任何一个,您仍然可以使用此方法并将自动 ID 存储为名为 random 的实际字段,而不是为此目的重载文档名称.

By default, the document name (__name__) is only indexed ascending, and you also cannot rename an existing document short of deleting and recreating. If you need either of these, you can still use this method and just store an auto-id as an actual field called random rather than overloading the document name for this purpose.

编写文档时,首先生成一个有界范围内的随机整数,并将其设置为名为random的字段.根据您期望的文档数量,您可以使用不同的有界范围来节省空间或降低冲突风险(这会降低此技术的有效性).

When you write a document, first generate a random integer in a bounded range and set it as a field called random. Depending on the number of documents you expect, you can use a different bounded range to save space or reduce the risk of collisions (which reduce the effectiveness of this technique).

您应该考虑您需要哪些语言,因为会有不同的考虑.虽然 Swift 很简单,但 JavaScript 有一个明显的问题:

You should consider which languages you need as there will be different considerations. While Swift is easy, JavaScript notably can have a gotcha:

  • 32 位整数:非常适合小型 (~10K 不太可能发生碰撞) 数据集
  • 64 位整数:大型数据集(注意:JavaScript 本身不支持,)
  • 32-bit integer: Great for small (~10K unlikely to have a collision) datasets
  • 64-bit integer: Large datasets (note: JavaScript doesn't natively support, yet)

这将创建一个索引,其中您的文档随机排序.稍后在我们的查询部分,您生成的随机值将是这些值中的另一个,而后面提到的低值"将是 -1.

This will create an index with your documents randomly sorted. Later in our query section, the random value you generate will be another one of these values, and the 'low value' mentioned later will be -1.

现在您有了一个随机索引,您需要查询它.下面我们看一些简单的变体来选择1个随机文档,以及选择多于1个的选项.

Now that you have a random index, you'll want to query it. Below we look at some simple variants to select a 1 random document, as well as options to select more than 1.

对于所有这些选项,您需要以与您在编写文档时创建的索引值相同的形式生成一个新的随机值,由下面的变量 random 表示.我们将使用此值在索引上找到一个随机位置.

For all these options, you'll want to generate a new random value in the same form as the indexed values you created when writing the document, denoted by the variable random below. We'll use this value to find a random spot on the index.

既然你有一个随机值,你可以查询单个文档:

Now that you have a random value, you can query for a single document:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                   .order(by: "random")
                   .limit(to: 1)

检查这是否返回了一个文档.如果不是,请再次查询,但对随机索引使用低值".例如,如果您执行随机整数,则 lowValue0:

Check that this has returned a document. If it doesn't, query again but use the 'low value' for your random index. For example, if you did Random Integers then lowValue is 0:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
                   .order(by: "random")
                   .limit(to: 1)

只要您有一个文档,就可以保证至少返回 1 个文档.

As long as you have a single document, you'll be guaranteed to return at least 1 document.

环绕方法易于实现,并允许您仅启用升序索引来优化存储.一个缺点是价值被不公平地屏蔽的可能性.例如,如果 10K 中的前 3 个文档 (A,B,C) 的随机索引值为 A:409496、B:436496、C:818992,则 A 和 C 被选中的机会不到 1/10K,而B 被 A 的接近有效屏蔽,只有大约 1/160K 的机会.

The wrap-around method is simple to implement and allows you to optimize storage with only an ascending index enabled. One downside is the possibility of values being unfairly shielded. E.g if the first 3 documents (A,B,C) out of 10K have random index values of A:409496, B:436496, C:818992, then A and C have just less than 1/10K chance of being selected, whereas B is effectively shielded by the proximity of A and only roughly a 1/160K chance.

您可以在>=<= 之间随机选择,而不是单向查询并在找不到值时环绕以双倍索引存储为代价,不公平屏蔽值的概率减半.

Rather than querying in a single direction and wrapping around if a value is not found, you can instead randomly select between >= and <=, which reduces the probability of unfairly shielded values by half, at the cost of double the index storage.

如果一个方向没有返回结果,切换到另一个方向:

If one direction returns no results, switch to the other direction:

queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
                   .order(by: "random", descending: true)
                   .limit(to: 1)

queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                   .order(by: "random")
                   .limit(to: 1)

选择多个随机文档

通常,您需要一次选择 1 个以上的随机文档.根据您想要的权衡,有两种不同的方法可以调整上述技术.

Selecting multiple random documents

Often, you'll want to select more than 1 random document at a time. There are 2 different ways to adjust the above techniques depending on what trade offs you want.

这个方法很简单.只需重复该过程,包括每次选择一个新的随机整数.

This method is straight forward. Simply repeat the process, including selecting a new random integer each time.

这种方法会给你随机的文档序列,而不必担心重复看到相同的模式.

This method will give you random sequences of documents without worrying about seeing the same patterns repeatedly.

权衡是它会比下一个方法慢,因为它需要为每个文档单独往返服务.

The trade-off is it will be slower than the next method since it requires a separate round trip to the service for each document.

在这种方法中,只需将限制中的数量增加到所需的文档即可.它有点复杂,因为您可能会在调用中返回 0..limit 文档.然后您需要以相同的方式获取丢失的文档,但限制减少到只有差异.如果您知道文档总数多于您要求的数量,则可以忽略在第二次调用(但不是第一次调用)时永远无法取回足够文档的边缘情况进行优化.

In this approach, simply increase the number in the limit to the desired documents. It's a little more complex as you might return 0..limit documents in the call. You'll then need to get the missing documents in the same manner, but with the limit reduced to only the difference. If you know there are more documents in total than the number you are asking for, you can optimize by ignoring the edge case of never getting back enough documents on the second call (but not the first).

此解决方案的权衡是重复序列.虽然文档是随机排序的,但如果您最终出现重叠范围,您将看到之前看到的相同模式.有一些方法可以减轻下一节关于重新播种的问题.

The trade-off with this solution is in repeated sequences. While the documents are randomly ordered, if you ever end up overlapping ranges you'll see the same pattern you saw before. There are ways to mitigate this concern discussed in the next section on reseeding.

这种方法比Rinse &"更快重复',因为您将在最好的情况下请求一次调用或最坏的情况下调用所有文档.

This approach is faster than 'Rinse & Repeat' as you'll be requesting all the documents in the best case a single call or worst case 2 calls.

如果文档集是静态的,则此方法会随机为您提供文档,但每个文档返回的概率也是静态的.这是一个问题,因为根据它们获得的初始随机值,某些值可能具有不公平的低或高概率.在许多用例中,这很好,但在某些情况下,您可能希望增加长期随机性,以便更均匀地返回任何 1 个文档.

While this method gives you documents randomly if the document set is static the probability of each document being returned will be static as well. This is a problem as some values might have unfairly low or high probabilities based on the initial random values they got. In many use cases, this is fine but in some, you may want to increase the long term randomness to have a more uniform chance of returning any 1 document.

请注意,插入的文档最终会在中间交织,逐渐改变概率,删除文档也是如此.如果考虑到文档数量,插入/删除率太小,有一些策略可以解决这个问题.

Note that inserted documents will end up weaved in-between, gradually changing the probabilities, as will deleting documents. If the insert/delete rate is too small given the number of documents, there are a few strategies addressing this.

无需担心重新播种,您始终可以为每个文档创建多个随机索引,然后每次随机选择其中一个索引.例如,将字段 random 设为包含子字段 1 到 3 的映射:

Rather than worrying out reseeding, you can always create multiple random indexes per document, then randomly select one of those indexes each time. For example, have the field random be a map with subfields 1 to 3:

{'random': {'1': 32456, '2':3904515723, '3': 766958445}}

现在您将随机查询 random.1、random.2、random.3,从而创建更大的随机分布.这实质上是通过增加存储空间来节省增加的计算量(文档写入),而不必重新播种.

Now you'll be querying against random.1, random.2, random.3 randomly, creating a greater spread of randomness. This essentially trades increased storage to save increased compute (document writes) of having to reseed.

每次更新文档时,重新生成 random 字段的随机值.这将在随机索引中移动文档.

Any time you update a document, re-generate the random value(s) of the random field. This will move the document around in the random index.

如果生成的随机值不是均匀分布的(它们是随机的,所以这是意料之中的),那么可能会在不适当的时间内选择同一个文档.这很容易通过在读取后用新的随机值更新随机选择的文档来抵消.

If the random values generated are not uniformly distributed (they're random, so this is expected), then the same document might be picked a dispropriate amount of the time. This is easily counteracted by updating the randomly selected document with new random values after it is read.

由于写入成本更高并且可以成为热点,您可以选择仅在读取时间的子集时更新(例如,if random(0,100) === 0) update;).

Since writes are more expensive and can hotspot, you can elect to only update on read a subset of the time (e.g, if random(0,100) === 0) update;).

这篇关于Firestore:如何获取集合中的随机文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆