如何在社交网络中实现活动流 [英] How to implement the activity stream in a social network

查看:26
本文介绍了如何在社交网络中实现活动流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发自己的社交网络,但在网络上没有找到实现用户操作流的示例...例如,如何过滤每个用户的操作?如何存储动作事件?我可以将哪种数据模型和对象模型用于操作流和操作本身?

解决方案

总结:对于大约 100 万活跃用户和 1.5 亿存储活动,我保持简单:

  • 使用关系数据库来存储独特的活动(每个活动 1 条记录/发生的事情")使记录尽可能紧凑.结构化,以便您可以通过活动 ID 或使用一组有时间限制的好友 ID 快速抓取一批活动.
  • 每当创建活动记录时,将活动 ID 发布到 Redis,将 ID 添加到活动流"列表中,供每个应该查看活动的朋友/订阅者用户使用.

查询 Redis 以获取任何用户的活动流,然后根据需要从数据库中获取相关数据.如果用户需要及时浏览(如果您甚至提供此功能),请回退到按时间查询数据库<小时>

我使用一个普通的旧 MySQL 表来处理大约 1500 万个活动.

看起来像这样:

id用户 ID(整数)活动类型(tinyint)source_id (int)parent_id (int)parent_type (tinyint)时间(日期时间,但像 int 这样更小的类型会更好)

activity_type 告诉我活动的类型,source_id 告诉我与活动相关的记录.因此,如果活动类型的意思是已添加收藏",那么我知道 source_id 指的是收藏记录的 ID.

parent_id/parent_type 对我的应用很有用 - 它们告诉我活动与什么相关.如果收藏了一本书,那么 parent_id/parent_type 会告诉我该活动与具有给定主键 (id) 的图书(类型)相关

我索引 (user_id, time) 并查询 user_id IN (...friends...) AND time > 的活动某个截止点.放弃 id 并选择不同的聚集索引可能是个好主意 - 我还没有尝试过.

非常基本的东西,但它有效,它很简单,并且随着您的需求变化很容易使用.此外,如果您不使用 MySQL,您可能可以在索引方面做得更好.

<小时>

为了更快地访问最近的活动,我一直在试验 Redis.Redis 将其所有数据存储在内存中,因此您不能将所有活动都放在那里,但您可以为站点上的大多数常用屏幕存储足够的数据.每个用户最近的 100 个或类似的东西.混合使用 Redis,它可能会像这样工作:

  • 创建您的 MySQL 活动记录
  • 对于创建活动的用户的每个朋友,将 ID 推送到他们在 Redis 中的活动列表中.
  • 将每个列表修剪到最后 X 项

Redis 速度很快,并提供了一种通过一个连接来管道命令的方法 - 因此将活动推送给 1000 个朋友需要几毫秒.

有关我所谈论内容的更详细说明,请参阅 Redis 的 Twitter 示例:http://redis.io/topics/twitter-clone

2011 年 2 月更新目前我有 5000 万个活跃活动,我没有改变任何东西.做类似的事情的一个好处是它使用紧凑的小行.我正计划进行一些更改,这些更改将涉及更多活动和对这些活动的更多查询,我肯定会使用 Redis 来加快速度.我在其他领域使用 Redis,它确实可以很好地解决某些类型的问题.

2014 年 7 月更新我们每月约有 70 万活跃用户.在过去的几年里,我一直在使用 Redis(如项目符号列表中所述)来存储每个用户的最后 1000 个活动 ID.系统中通常有大约1亿条活动记录,它们仍然存储在MySQL中,并且仍然是相同的布局.这些记录让我们可以使用更少的 Redis 内存,它们充当活动数据的记录,如果用户需要及时翻页以查找某些内容,我们会使用它们.

这不是一个聪明或特别有趣的解决方案,但它对我很有帮助.

I'm developing my own social network, and I haven't found on the web examples of implementation the stream of users' actions... For example, how to filter actions for each users? How to store the action events? Which data model and object model can I use for the actions stream and for the actions itselves?

解决方案

Summary: For about 1 million active users and 150 million stored activities, I keep it simple:

  • Use a relational database for storage of unique activities (1 record per activity / "thing that happened") Make the records as compact as you can. Structure so that you can quickly grab a batch of activities by activity ID or by using a set of friend IDs with time constraints.
  • Publish the activity IDs to Redis whenever an activity record is created, adding the ID to an "activity stream" list for every user who is a friend/subscriber that should see the activity.

Query Redis to get the activity stream for any user and then grab the related data from the db as needed. Fall back to querying the db by time if the user needs to browse far back in time (if you even offer this)


I use a plain old MySQL table for dealing with about 15 million activities.

It looks something like this:

id             
user_id       (int)
activity_type (tinyint)
source_id     (int)  
parent_id     (int)
parent_type   (tinyint)
time          (datetime but a smaller type like int would be better) 

activity_type tells me the type of activity, source_id tells me the record that the activity is related to. So if the activity type means "added favorite" then I know that the source_id refers to the ID of a favorite record.

The parent_id/parent_type are useful for my app - they tell me what the activity is related to. If a book was favorited, then parent_id/parent_type would tell me that the activity relates to a book (type) with a given primary key (id)

I index on (user_id, time) and query for activities that are user_id IN (...friends...) AND time > some-cutoff-point. Ditching the id and choosing a different clustered index might be a good idea - I haven't experimented with that.

Pretty basic stuff, but it works, it's simple, and it is easy to work with as your needs change. Also, if you aren't using MySQL you might be able to do better index-wise.


For faster access to the most recent activities, I've been experimenting with Redis. Redis stores all of its data in-memory, so you can't put all of your activities in there, but you could store enough for most of the commonly-hit screens on your site. The most recent 100 for each user or something like that. With Redis in the mix, it might work like this:

  • Create your MySQL activity record
  • For each friend of the user who created the activity, push the ID onto their activity list in Redis.
  • Trim each list to the last X items

Redis is fast and offers a way to pipeline commands across one connection - so pushing an activity out to 1000 friends takes milliseconds.

For a more detailed explanation of what I am talking about, see Redis' Twitter example: http://redis.io/topics/twitter-clone

Update February 2011 I've got 50 million active activities at the moment and I haven't changed anything. One nice thing about doing something similar to this is that it uses compact, small rows. I am planning on making some changes that would involve many more activities and more queries of those activities and I will definitely be using Redis to keep things speedy. I'm using Redis in other areas and it really works well for certain kinds of problems.

Update July 2014 We're up to about 700K monthly active users. For the last couple years, I've been using Redis (as described in the bulleted list) for storing the last 1000 activity IDs for each user. There are usually about 100 million activity records in the system and they are still stored in MySQL and are still the same layout. These records let us get away with less Redis memory, they serve as the record of activity data, and we use them if users need to page further back in time to find something.

This wasn't a clever or especially interesting solution but it has served me well.

这篇关于如何在社交网络中实现活动流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆