Firestore:如何在集合中获取随机文档 [英] Firestore: How to get random documents in a collection

查看:118
本文介绍了Firestore:如何在集合中获取随机文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的应用程序来说,至关重要的是能够从firebase的集合中随机选择多个文档。



由于Firebase中没有内置本机功能,我知道)要实现一个做到这一点的查询,我首先想到的是使用查询游标选择一个随机的开始和结束索引,前提是我拥有​​集合中的文档数量。



此方法行之有效,但只能以有限的方式进行,因为每次每次文档都会与其相邻文档一起依次提供;但是,如果我能够通过其父集合中的索引选择一个文档,则可以实现随机文档查询,但是问题是我找不到任何描述如何执行此操作甚至可以执行此操作的文档。



这就是我想做的,请考虑以下Firestore模式:

  root / 
个帖子/
docA
docB
docC
docD

然后在我的客户端(我处于Swift环境中),我想编写一个查询来做到这一点:

  db.collection( posts)[0,1,3] //将返回:docA,docB,docD 

无论如何,我可以按照这种方式做点什么吗?或者,是否有其他方法可以类似方式选择随机文档?



请帮助。

解决方案

使用随机生成的索引和简单查询,您可以从Cloud Firestore中的集合或集合组中随机选择文档。



此答案分为4个部分,每个部分都有不同的选项:


  1. 如何生成随机索引

  2. 如何查询随机索引

  3. 选择多个随机文档

  4. 进行随机性播种



如何生成随机索引



此答案的基础是创建一个索引字段,该字段在升序或降序时会导致所有文档都被随机排序。创建它的方法有多种,因此让我们看一下2,从最容易获得的开始。



自动编号版本



如果使用的是客户端库中提供的随机生成的自动ID,则可以使用同一系统随机选择文档。在这种情况下,随机排序的索引是文档ID。



稍后在我们的查询部分,您生成的随机值是一个新的自动值。 -id( iOS Android 网站),而您查询的字段是 __ name __ 字段,后面提到的低值是一个空字符串。到目前为止,这是生成随机索引的最简单方法,无论使用哪种语言和平台,它都可以正常工作。



默认情况下,文档名称为( __ name__ )仅按升序编制索引,并且您也不能在删除和重新创建后重命名现有文档。如果您需要其中任何一个,您仍然可以使用此方法,只是将自动ID存储为称为 random 的实际字段,而不是为此目的而重载文档名。 / p>

随机整数版本



编写文档时,请先生成一个有界范围内的随机整数并将其设置作为名为 random 的字段。根据您期望的文档数量,您可以使用其他有界范围来节省空间或降低发生碰撞的风险(这会降低此技术的有效性)。



您应该考虑所需的语言,因为会有不同的考虑。尽管Swift很简单,但是JavaScript显然可以理解:




  • 32位整数:很小(〜10K 不太可能发生冲突)数据集

  • 64位整数:大型数据集(注意:JavaScript不会还不是本地支持,



这将创建一个索引,其中文档随机排序。在我们的查询部分的后面,您生成的随机值将是这些值中的另一个,而稍后提到的低值将是-1。



如何查询随机索引



现在您有了一个随机索引,您将需要对其进行查询。下面,我们看一些简单的变体来选择1个随机文档,以及选择多个变量的选项。



对于所有这些选项,您都需要生成一个新的随机值,其形式与编写文档时创建的索引值相同,由下面的变量 random 表示。我们将使用该值在索引上找到随机点。



环绕式



现在您具有随机值,则可以查询单个文档:

  let postsRef = db.collection( posts)
queryRef = postsRef.whereField( random,isGreaterThanOrEqualTo:random)
.order(by: random)
.limit(to:1)

检查是否已返回文档。如果不是,请再次查询,但对随机索引使用低值。例如,如果您使用随机整数,则 lowValue 0

  let postsRef = db.collection( posts)
queryRef = postsRef.whereField( random,isGreaterThanOrEqualTo:lowValue)
.order(by : random)
.limit(至:1)

只要一个文档,您将保证至少返回一个文档。



双向



环绕式方法易于实现,并允许您仅启用升序索引来优化存储。缺点之一是价值观受到不公平保护的可能性。例如,如果10K中的前3个文档(A,B,C)具有随机索引值A:409496,B:436496,C:818992,则A和C的被选中机会不到1 / 10K,而B受到A的接近性的有效屏蔽,只有大约1 / 160K的机会。



与其在一个方向查询并在找不到值的情况下回绕,您可以在> = < = 之间随机选择,这将受到不公正保护的值的可能性降低了一半,这会使索引存储量增加一倍。



如果一个方向没有返回结果,请切换到另一个方向:

  queryRef = postsRef.whereField( random,isLessThanOrEqualTo:random)
.order(by: random,降序:true)
.limit(到:1)

queryRef = postsRef.whereField( random,isGreaterThanOrEqualTo:random)
.order(by: random)
.limi t(至:1)



选择多个随机文档



通常,您一次要选择多个随机文档。有两种不同的方法可以根据需要进行权衡调整上述技术。



Rinse&重复



此方法很简单。只需重复此过程,包括每次选择一个新的随机整数即可。



此方法将为您提供随机的文档序列,而不必担心重复看到相同的模式。



需要权衡的是,它比下一种方法要慢,因为每个文档都需要单独往返服务。



保持进展



在这种方法中,只需增加所需文档的数量即可。稍微复杂一点,因为您可能会在通话中返回 0..limit 个文档。然后,您需要以相同的方式获取丢失的文档,但将限制减少到仅区别。如果您知道总文件数超过要索取的文件数,则可以通过忽略在第二次调用(而不是第一次调用)中永远不会取回足够文档的边缘情况进行优化。



此解决方案的权衡是重复进行的。尽管文档是随机排序的,但是如果最终出现重叠范围,则会看到与以前相同的样式。在下一部分有关补种的部分中讨论了缓解此问题的方法。



这种方法比冲洗和清洗更快。重复,因为您将在一次呼叫中最好的情况下或两次呼叫中最坏的情况下请求所有文档。



不断进行随机播种



尽管如果文档集是静态的,此方法将为您随机提供文档,则返回每个文档的概率也将是静态的。这是一个问题,因为根据获得的初始随机值,某些值可能具有不公平的低或高概率。在许多用例中,这很好,但是在某些情况下,您可能希望增加长期随机性,以便有更统一的机会返回任何1个文档。



插入的文档最终将在中间编织,逐渐更改概率,删除文档也会如此。如果给定文档数量,插入/删除率太小,则有一些策略可以解决此问题。



多随机



您不必担心重新播种,总可以为每个文档创建多个随机索引,然后每次都随机选择其中一个索引。例如,让字段 random 是具有子字段1至3的地图。

  {'random':{'1':32456,'2':3904515723,'3':766958445}} 

现在,您将随机查询random.1,random.2,random.3,从而扩大随机性。实质上,这是为了增加存储空间而进行的交易,以节省必须重新设置种子的计算量。



写入时获得种子



每次更新文档时,请重新生成 random 字段的随机值。



读取时被填充



如果生成的随机值是而不是均匀分布(它们是随机的,因此是可以预期的),那么同一文档可能会在不适当的时间内被选择。读取后,通过使用新的随机值更新随机选择的文档,可以很容易地解决此问题。



由于写入操作更昂贵且可能引起热点,因此您可以选择仅更新读取时间的一个子集(例如如果random(0,100)=== 0)更新; )。


It is crucial for my application to be able to select multiple documents at random from a collection in firebase.

Since there is no native function built in to Firebase (that I know of) to achieve a query that does just this, my first thought was to use query cursors to select a random start and end index provided that I have the number of documents in the collection.

This approach would work but only in a limited fashion since every document would be served up in sequence with its neighboring documents every time; however, if I was able to select a document by its index in its parent collection I could achieve a random document query but the problem is I can't find any documentation that describes how you can do this or even if you can do this.

Here's what I'd like to be able to do, consider the following firestore schema:

root/
  posts/
     docA
     docB
     docC
     docD

Then in my client (I'm in a Swift environment) I'd like to write a query that can do this:

db.collection("posts")[0, 1, 3] // would return: docA, docB, docD

Is there anyway I can do something along the lines of this? Or, is there a different way I can select random documents in a similar fashion?

Please help.

解决方案

Using randomly generated indexes and simple queries, you can randomly select documents from a collection or collection group in Cloud Firestore.

This answer is broken into 4 sections with different options in each section:

  1. How to generate the random indexes
  2. How to query the random indexes
  3. Selecting multiple random documents
  4. Reseeding for ongoing randomness

How to generate the random indexes

The basis of this answer is creating an indexed field that when ordered ascending or descending, results in all the document being randomly ordered. There are different ways to create this, so let's look at 2, starting with the most readily available.

Auto-Id version

If you are using the randomly generated automatic ids provided in our client libraries, you can use this same system to randomly select a document. In this case, the randomly ordered index is the document id.

Later in our query section, the random value you generate is a new auto-id (iOS, Android, Web) and the field you query is the __name__ field, and the 'low value' mentioned later is an empty string. This is by far the easiest method to generate the random index and works regardless of the language and platform.

By default, the document name (__name__) is only indexed ascending, and you also cannot rename an existing document short of deleting and recreating. If you need either of these, you can still use this method and just store an auto-id as an actual field called random rather than overloading the document name for this purpose.

Random Integer version

When you write a document, first generate a random integer in a bounded range and set it as a field called random. Depending on the number of documents you expect, you can use a different bounded range to save space or reduce the risk of collisions (which reduce the effectiveness of this technique).

You should consider which languages you need as there will be different considerations. While Swift is easy, JavaScript notably can have a gotcha:

  • 32-bit integer: Great for small (~10K unlikely to have a collision) datasets
  • 64-bit integer: Large datasets (note: JavaScript doesn't natively support, yet)

This will create an index with your documents randomly sorted. Later in our query section, the random value you generate will be another one of these values, and the 'low value' mentioned later will be -1.

How to query the random indexes

Now that you have a random index, you'll want to query it. Below we look at some simple variants to select a 1 random document, as well as options to select more than 1.

For all these options, you'll want to generate a new random value in the same form as the indexed values you created when writing the document, denoted by the variable random below. We'll use this value to find a random spot on the index.

Wrap-around

Now that you have a random value, you can query for a single document:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                   .order(by: "random")
                   .limit(to: 1)

Check that this has returned a document. If it doesn't, query again but use the 'low value' for your random index. For example, if you did Random Integers then lowValue is 0:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
                   .order(by: "random")
                   .limit(to: 1)

As long as you have a single document, you'll be guaranteed to return at least 1 document.

Bi-directional

The wrap-around method is simple to implement and allows you to optimize storage with only an ascending index enabled. One downside is the possibility of values being unfairly shielded. E.g if the first 3 documents (A,B,C) out of 10K have random index values of A:409496, B:436496, C:818992, then A and C have just less than 1/10K chance of being selected, whereas B is effectively shielded by the proximity of A and only roughly a 1/160K chance.

Rather than querying in a single direction and wrapping around if a value is not found, you can instead randomly select between >= and <=, which reduces the probability of unfairly shielded values by half, at the cost of double the index storage.

If one direction returns no results, switch to the other direction:

queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
                   .order(by: "random", descending: true)
                   .limit(to: 1)

queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
                   .order(by: "random")
                   .limit(to: 1)

Selecting multiple random documents

Often, you'll want to select more than 1 random document at a time. There are 2 different ways to adjust the above techniques depending on what trade offs you want.

Rinse & Repeat

This method is straight forward. Simply repeat the process, including selecting a new random integer each time.

This method will give you random sequences of documents without worrying about seeing the same patterns repeatedly.

The trade-off is it will be slower than the next method since it requires a separate round trip to the service for each document.

Keep it coming

In this approach, simply increase the number in the limit to the desired documents. It's a little more complex as you might return 0..limit documents in the call. You'll then need to get the missing documents in the same manner, but with the limit reduced to only the difference. If you know there are more documents in total than the number you are asking for, you can optimize by ignoring the edge case of never getting back enough documents on the second call (but not the first).

The trade-off with this solution is in repeated sequences. While the documents are randomly ordered, if you ever end up overlapping ranges you'll see the same pattern you saw before. There are ways to mitigate this concern discussed in the next section on reseeding.

This approach is faster than 'Rinse & Repeat' as you'll be requesting all the documents in the best case a single call or worst case 2 calls.

Reseeding for ongoing randomness

While this method gives you documents randomly if the document set is static the probability of each document being returned will be static as well. This is a problem as some values might have unfairly low or high probabilities based on the initial random values they got. In many use cases, this is fine but in some, you may want to increase the long term randomness to have a more uniform chance of returning any 1 document.

Note that inserted documents will end up weaved in-between, gradually changing the probabilities, as will deleting documents. If the insert/delete rate is too small given the number of documents, there are a few strategies addressing this.

Multi-Random

Rather than worrying out reseeding, you can always create multiple random indexes per document, then randomly select one of those indexes each time. For example, have the field random be a map with subfields 1 to 3:

{'random': {'1': 32456, '2':3904515723, '3': 766958445}}

Now you'll be querying against random.1, random.2, random.3 randomly, creating a greater spread of randomness. This essentially trades increased storage to save increased compute (document writes) of having to reseed.

Reseed on writes

Any time you update a document, re-generate the random value(s) of the random field. This will move the document around in the random index.

Reseed on reads

If the random values generated are not uniformly distributed (they're random, so this is expected), then the same document might be picked a dispropriate amount of the time. This is easily counteracted by updating the randomly selected document with new random values after it is read.

Since writes are more expensive and can hotspot, you can elect to only update on read a subset of the time (e.g, if random(0,100) === 0) update;).

这篇关于Firestore:如何在集合中获取随机文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆