Firestore,如何构造"likedBy"询问 [英] Firestore, how to structure a "likedBy" query

查看:64
本文介绍了Firestore,如何构造"likedBy"询问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在解决如何最好地构造我的(非常简单的)Firestore应用程序方面遇到了一些麻烦.我有一组这样的用户:

users: {
   'A123': {
      'name':'Adam'
   },
   'B234': {
      'name':'Bella'
   },
   'C345': {
      'name':'Charlie'
   }
}

...并且每个用户可以喜欢"或不喜欢"任意数量的其他用户(例如Tinder).

我想构造一个喜欢"表(或等效的Firestore),以便我列出尚未喜欢或不喜欢的人.我最初的想法是在用户表中使用这样的布尔值创建一个喜欢"对象:

users: {
   'A123': {
      'name':'Adam',
      'likedBy': {
         'B234':true,
      },
      'disLikedBy': {
         'C345':true
      }
   },
   'B234': {
      'name':'Bella'
   },
   'C345': {
      'name':'Charlie'
   }
}

这样,如果我是Charlie并且知道自己的ID,则可以列出我尚未喜欢或不喜欢的用户:

var usersRef = firebase.firestore().collection('users')
.where('likedBy.C345','==',false)
.where('dislikedBy.C345','==',false)

这不起作用(每个人都被列出),所以我怀疑我的方法是错误的,尤其是'== false'部分.有人能指出我正确的方向吗?额外的一个问题是,如果有人更改了姓名怎么办?我是否需要更改所有嵌入的"likedBy"数据?还是可以使用云功能来实现这一目标?

谢谢!

解决方案

没有针对此问题的完美解决方案,但是您可以根据需要进行权衡取舍

选项:过扫描与欠扫描

请记住,Cloud Firestore仅允许查询的扩展独立于数据集的总大小.

这对于防止您构建可以在10个文档中进行测试的东西确实很有帮助,但是一旦投入生产并大受欢迎,它就会大打折扣.不幸的是,这种类型的问题并不适合这种可扩展的模式,您拥有的配置文件越多,人们创建的喜好程度越高,在此处回答所需查询的时间就越长.

然后的解决方案是找到一个或多个可扩展且最能代表您想要的查询.我可以想到两种选择,它们以不同的方式进行权衡:

  1. 过扫描->进行更广泛的查询,然后在客户端进行过滤
  2. 欠扫描->进行一个或多个狭窄的查询,可能会遗漏一些结果.

过扫描

在过扫描"选项中,您基本上是在为了提高100%的准确性而交易增加的成本.

鉴于您的用例,我想这实际上可能是您的最佳选择.由于配置文件的总数可能比个人喜欢的配置文件的数量大几个数量级,因此过扫描的成本增加可能是无关紧要的.

只需选择与您具有的任何其他条件匹配的所有配置文件,然后在客户端过滤掉用户已经喜欢的任何配置文件.

首先,获取用户喜欢的所有个人资料:

var likedUsers = firebase.firestore().collection('users') .where('likedBy.C345','==',false)

然后获取所有用户,对照第一个列表进行检查,并丢弃所有匹配的内容.

var allUsers = firebase.firestore().collection('users').get()

根据规模,您可能需要优化第一步,例如每次用户喜欢某人时,请为该用户针对自己喜欢的每个人更新一个文档中的数组.这样,您只需为第一步获取一个文档即可.

var likedUsers = firebase.firestore().collection('likedUsers') .doc('C345').get()

由于此查询确实按结果集的大小进行缩放(通过将结果集定义为数据集),因此Cloud Firestore可以回答它而无需进行大量隐藏的不可缩放的工作.不可扩展部分留给您优化(上面有2个示例).

欠扫描

在欠扫描"选项中,您基本上是在以准确性为代价来获得更窄(因此更便宜)的结果.

这种方法更复杂,因此,如果出于某种原因,我喜欢的不同比率不是我在过扫描"选项中所怀疑的,那么您可能只想考虑使用它.

基本想法是,如果您确实喜欢某人,则将其排除在外,并接受权衡取舍,即您还可能排除您尚未喜欢的某人-是的,基本上是 Firestore:如何在集合中获取随机文档.这样一来,您就可以从集合中提取随机配置文件,从而使分布更均匀,并减少绊倒许多以前喜欢的配置文件的机会.

欠扫描-奖励

我怀疑您使用欠扫描选项会遇到的一个问题是非常受欢迎的配置文件.如果某人几乎总是被喜欢,那么如果该配置文件的大小不适合保存在单个文档中,则可能会超出Bloom Bloom过滤器的用途(您希望m小于8000以避免运行)纳入Cloud Firestore中的每个文档索引限制).

对于此问题,您只想为这些配置文件组合过扫描"选项.使用Cloud Functions,将映射的x%设置为true以上的任何配置文件都会将popular标志设置为true.对流行标志上的每个人进行过扫描,然后将它们编织到欠扫描"结果中(请记住要进行丢弃设置).

I'm having a little trouble wrapping my head around how to best structure my (very simple) Firestore app. I have a set of users like this:

users: {
   'A123': {
      'name':'Adam'
   },
   'B234': {
      'name':'Bella'
   },
   'C345': {
      'name':'Charlie'
   }
}

...and each user can 'like' or 'dislike' any number of other users (like Tinder).

I'd like to structure a "likes" table (or Firestore equivalent) so that I can list people who I haven't yet liked or disliked. My initial thought was to create a "likes" object within the user table with boolean values like this:

users: {
   'A123': {
      'name':'Adam',
      'likedBy': {
         'B234':true,
      },
      'disLikedBy': {
         'C345':true
      }
   },
   'B234': {
      'name':'Bella'
   },
   'C345': {
      'name':'Charlie'
   }
}

That way if I am Charlie and I know my ID, I could list users that I haven't yet liked or disliked with:

var usersRef = firebase.firestore().collection('users')
.where('likedBy.C345','==',false)
.where('dislikedBy.C345','==',false)

This doesn't work (everyone gets listed) so I suspect that my approach is wrong, especially the '==false' part. Could someone please point me in the right direction of how to structure this? As a bonus extra question, what happens if somebody changes their name? Do I need to change all of the embedded "likedBy" data? Or could I use a cloud function to achieve this?

Thanks!

解决方案

There isn't a perfect solution for this problem, but there are alternatives you can do depending on what trade-offs you want.

The options: Overscan vs Underscan

Remember that Cloud Firestore only allows queries that scale independent of the total size of your dataset.

This can be really helpful in preventing you from building something that works in test with 10 documents, but blows up as soon as you go to production and become popular. Unfortunately, this type of problem doesn't fit that scalable pattern and the more profiles you have, and the more likes people create, the longer it takes to answer the query you want here.

The solution then is to find a one or more queries that scale and most closely represent what you want. There are 2 options I can think of that make trade-offs in different ways:

  1. Overscan --> Do a broader query and then filter on the client-side
  2. Underscan --> Do one or more narrower queries that might miss a few results.

Overscan

In the Overscan option, you're basically trading increased cost to get 100% accuracy.

Given your use-case, I imagine this might actually be your best option. Since the total number of profiles is likely orders of magnitude larger than the number of profiles an individual has liked, the increased cost of overscanning is probably inconsequential.

Simply select all profiles that match any other conditions you have, and then on the client side, filter out any that the user has already liked.

First, get all the profiles liked by the user:

var likedUsers = firebase.firestore().collection('users') .where('likedBy.C345','==',false)

Then get all users, checking against the first list and discarding anything that matches.

var allUsers = firebase.firestore().collection('users').get()

Depending on the scale, you'll probably want to optimize the first step, e.g. every time the user likes someone, update an array in a single document for that user for everyone they have liked. This way you can simply get a single document for the first step.

var likedUsers = firebase.firestore().collection('likedUsers') .doc('C345').get()

Since this query does scale by the size of the result set (by defining the result set to be the data set), Cloud Firestore can answer it without a bunch of hidden unscalable work. The unscalable part is left to you to optimize (with 2 examples above).

Underscan

In the Underscan option, you're basically trading accuracy to get a narrower (hence cheaper) set of results.

This method is more complex, so you probably only want to consider it if for some reason the liked to unliked ratio is not as I suspect in the Overscan option.

The basic idea is to exclude someone if you've definitely liked them, and accept the trade-off that you might also exclude someone you haven't yet liked - yes, basically a Bloom filter.

In each users profile store a map of true/false values from 0 to m (we'll get to what m is later), where everything is set to false initially.

When a user likes the profile, calculate the hash of the user's ID to insert into the Bloom filter and set all those bits in the map to true.

So let's say C345 hashes to 0110 if we used m = 4, then your map would look like:

likedBy: { 
   0: false,
   1: true,
   2: true,
   3: false }

Now, to find people you definitely haven't liked, you need use the same concept to do a query against each bit in the map. For any bit 0 to m that your hash is true on, query for it to be false:

var usersRef = firebase.firestore().collection('users')
.where('likedBy.1','==',false)

Etc. (This will get easier when we support OR queries in the future). Anyone who has a false value on a bit where your user's ID hashes to true definitely hasn't been liked by them.

Since it's unlikely you want to display ALL profiles, just enough to display a single page, you can probably randomly select a single one of the ID's hash bits that is true and just query against it. If you run out of profiles, just select another one that was true and restart.

Assuming most profiles are liked 500 or less times, you can keep the false positive ratio to ~20% or less using m = 1675.

There are handy online calculators to help you work out ratios of likes per profile, desired false positive ratio, and m, for example here.

Overscan - bonus

You'll quickly realize in the Overscan option that every time you run the query, the same profiles the user didn't like last time will be shown. I'm assuming you don't want that. Worse, all the ones the user liked will be early on in the query, meaning you'll end up having to skip them all the time and increase your costs.

There is an easy fix for that, use the method I describe on this question, Firestore: How to get random documents in a collection. This will enable you to pull random profiles from the set, giving you a more even distribution and reducing the chance of stumbling on lots of previously liked profiles.

Underscan - bonus

One problem I suspect you'll have with the Underscan option is really popular profiles. If someone is almost always liked, you might start exceeding the usefulness of a bloom filter if that profile has a size not reasonable to keep in a single document (you'll want m to be less than say 8000 to avoid running into per document index limits in Cloud Firestore).

For this problem, you want to combine the Overscan option just for these profiles. Using Cloud Functions, any profile that has more than x% of the map set to true gets a popular flag set to true. Overscan everyone on the popular flag and weave them into your results from the Underscan (remember to do the discard setup).

这篇关于Firestore,如何构造"likedBy"询问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆