您将如何为 Twitter 等社交网站设计 AppEngine 数据存储? [英] How would you design an AppEngine datastore for a social site like Twitter?

查看:25
本文介绍了您将如何为 Twitter 等社交网站设计 AppEngine 数据存储?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道设计社交应用程序的最佳方式是什么,让成员使用 Google AppEngine 进行活动并关注其他成员的活动.

I'm wondering what would be the best way to design a social application where members make activities and follow other member's activities using Google AppEngine.

更具体地说,假设我们有这些实体:

To be more specific lets assume we have these entities:

  • 用户有朋友
  • Activity 代表用户所做的操作(假设每个活动都有一个字符串消息和一个指向其所有者用户的 ReferenceProperty,或者它可以通过 appengine 的键使用父关联)
  • Users who have friends
  • Activities which represent actions made by users (lets say each has a string message and a ReferenceProperty to its owner user, or it can use parent association via appengine's key)

难点在于关注您朋友的活动,这意味着汇总您所有朋友的最新活动.通常,这将是活动表和您的朋友列表之间的连接,但这在 appengine 上不是可行的设计,因为没有连接模拟它需要启动 N 个查询(其中 N 是朋友的数量)然后在内存中合并 -非常昂贵,可能会超过请求截止日期...)

The hard part is following your friend's activities, which means aggregating the latest activities from all your friends. Normally, that would be a join between the Activities table and your friends list but thats not a viable design on appengine as there are no join simulating it will require firing up N queries (where N is number of friends) and then merging in memory - very expensive and will probably exceed request deadline...)

我目前正在考虑使用收件箱队列来实现这一点,其中创建新活动将触发一个后台进程,该进程会将新活动的密钥放入每个以下用户的收件箱":

I'm currently thinking of implementing this using inbox queues where creation of a new Activity will fire a background process that will put the new activity's key in the "inbox" of every following user:

  • 获取所有关注 X 的用户"是一个可能的应用引擎查询
  • 对基本上存储(用户、活动键)元组的新收件箱"实体进行批量输入并不是非常昂贵.

我很高兴听到有关此设计的想法或替代建议等.

I'll be happy to heard thought on this design or alternative suggestions etc.

推荐答案

看看 在 App Engine 上构建可扩展的复杂应用 (pdf),Brett Slatkin 在 Google I/O 上的精彩演讲.他解决了构建 Twitter 等可扩展消息传递服务的问题.

Take a look at Building Scalable, Complex Apps on App Engine (pdf), a fascinating talk given at Google I/O by Brett Slatkin. He addresses the problem of building a scalable messaging service like Twitter.

这是他使用列表属性的解决方案:

Here's his solution using a list property:

class Message(db.Model):
    sender = db.StringProperty()
    body = db.TextProperty()

class MessageIndex(db.Model):
    #parent = a message
    receivers = db.StringListProperty()

indexes = MessageIndex.all(keys_only = True).filter('receivers = ', user_id)
keys = [k.parent() for k in indexes)
messages = db.get(keys)

该键仅查询接收器等于您指定的接收器的消息索引,而无需反序列化和序列化接收器列表.然后你使用这些索引来只抓取你想要的消息.

This key only query finds the message indices with a receiver equal to the one you specified without deserializing and serializing the list of receivers. Then you use these indices to only grab the messages that you want.

这是错误的做法:

class Message(db.Model):
    sender = db.StringProperty()
    receivers = db.StringListProperty()
    body = db.TextProperty()

messages = Message.all().filter('receivers =', user_id)

这是低效的,因为查询必须解包查询返回的所有结果.因此,如果您返回 100 条消息,每个接收者列表中有 1,000 个用户,则您必须反序列化 100,000 (100 x 1000) 个列表属性值.数据存储延迟和 CPU 成本太高了.

This is inefficient because queries have to unpackage all of the results returned by your query. So if you returned 100 messages with 1,000 users in each receivers list you'd have to deserialize 100,000 (100 x 1000) list property values. Way too expensive in datastore latency and cpu.

一开始我对这一切感到很困惑,所以我写了一个 关于使用列表属性的简短教程.享受:)

I was pretty confused by all of this at first, so I wrote up a short tutorial about using the list property. Enjoy :)

这篇关于您将如何为 Twitter 等社交网站设计 AppEngine 数据存储?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆