如何构建DynamoDB数据库以允许查询趋势帖子? [英] How to structure a DynamoDB database to allow queries for trending posts?

查看:98
本文介绍了如何构建DynamoDB数据库以允许查询趋势帖子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算使用以下公式计算趋势帖子:

I am planning on using the following formula to calculate "trending" posts:

Trending Score = (p - 1) / (t + 2)^1.5

p =用户的投票(点)。
t =提交以来的时间(以小时为单位)。

p = votes (points) from users. t = time since submission in hours.

我正在寻找有关如何构建数据库表的建议,以便可以使用DynamoDB查询趋势发布文章(

I am looking for advice on how to structure my database tables so that I can query for trending posts with DynamoDB (a nosql database service from Amazon).

DynamoDB对于表中的每个项目都需要一个主键。主键可以由两部分组成:哈希属性(字符串或数字)和范围属性(字符串或数字)。哈希属性对于每个项目必须是唯一的,并且是必需的。范围属性是可选的,但是如果使用DynamoDB,它将在范围属性上建立排序的范围索引。

DynamoDB requires a Primary Key for each item in a table. The Primary Key can consist of 2 parts: the Hash Attribute (string or number) and the Range Attribute (string or number). The Hash Attribute must be unique for each item and is required. The Range Attribute is optional, but if used DynamoDB will build a sorted range index on the Range Attribute.

我想到的结构如下:

表名:用户

HashAttribute:  user_id
RangeAttribute: NONE
OtherFields: first_name, last_name

表名:帖子

HashAttribute:  post_id
RangeAttribute: NONE
OtherFields: user_id,title, content, points, categories[ ]

TableName:类别

HashAttribute:  category_name
RangeAttribute: post_id
OtherFields: title, content, points

表名:计数器

HashAttribute:  counter_name
RangeAttribute: NONE
OtherFields: counter_value

所以这是t的一个例子我将通过以下表格设置进行的请求类型(示例:user_id = 100):

So here is an example of the types of requests I would make with the following table setup (example: user_id=100):

用户操作1:

用户创建一个新帖子,并将该帖子标记为2类(棒球,足球)

User creates a new post and tags the post for 2 categories (baseball,soccer)

查询(1) :

检查counter_name ='post_id'的当前值并递增+1,然后使用新的post_id

Check current value for the counter_name='post_id' and increment+1 and use the new post_id

查询(2)::在Posts表中插入以下内容:

Query (2): Insert the following into the Posts table:

post_id=value_from_query_1, user_id=100, title=user_generated, content=user_generated, points=0, categories=['baseball','soccer']

查询(3):

将以下内容插入类别表:

Insert the following into the Categories table:

category_name='baseball', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0

查询(4):

插入以下内容进入类别表:

Insert the following into the Categories table:

category_name='soccer', post_id=value_from_query_1, title=user_generated, content=user_generated, points=0




最终目标是能够进行以下类型的查询:



1.查询趋势帖子


2.查询特定类别的帖子


3.查询具有最高分值的帖子



The end goal is to be able to conduct the following types of queries:

1. Query for trending posts

2. Query for posts in a certain category

3. Query for posts with the highest point values

有人有任何想法吗?如何构造表格,以便查询趋势帖子?还是我放弃了通过切换到DynamoDB来做的事情?

Does anyone have any idea how I could structure my tables so that I could do a query for trending posts? Or is this something I give the up the ability to do by switching to DynamoDB?

推荐答案

我从一个注释开始您的时间戳记与post_id的对比。

由于您将DynamoDB用作post_id生成器,因此就存在可伸缩性问题。
这些数字本质上是不可缩放的,您最好使用date对象。
如果您需要在疯狂的时间内创建帖子,则可以开始阅读有关Twitter的工作方式的信息。
http://blog.twitter.com/2010/announcing-snowflake

I'm starting with a note on your comment with the timestamp vs post_id.
Since you are going to use DynamoDB as your post_id generator, there is a scalability issue right there. Those numbers are inherently unscalable and you better off using a date object. If you need to create posts in a crazy speed time you can start reading about how twitter are doing it http://blog.twitter.com/2010/announcing-snowflake

现在让我们回到趋势检查中:

我相信您的情况正在滥用DynamoDB。

假设您有一个HOT类别,其中包含最多的帖子。
基本上,您将必须扫描整个帖子(因为数据分布不佳),并且每次开始查看要点并在服务器中进行比较。这只是行不通或非常昂贵,因为每次您都可能会使用所有保留的读取单位容量。

Now let's get back to your trending check:
I believe your scenario is misusing DynamoDB.
Let's say you have one HOT category that has most posts in it. Basically you will have to scan the whole posts (since the data isn't spread well) and for each start to look at the points and do the comparisons in your server. This will just not work or will be very expensive since each time you will probably use all your reserved read units capacity.

针对这些类型的DynamoDB方法趋势检查正在使用MapReduce

阅读此处,了解如何实现这些功能: http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html

The DynamoDB approach for those type of trends checking is using MapReduce
Read here how to implement those: http://aws.typepad.com/aws/2012/01/aws-howto-using-amazon-elastic-mapreduce-with-dynamodb.html

我无法指定时间,但我相信您会发现这种方法具有可扩展性-尽管您将无法经常使用它。 />

I can't specify a time, but I believe you will find this approach scalable - though you won't be able to use it often.

另一方面,您可以保留前10/100个热门问题
的列表,然后实时更新它们当帖子被投票时-您将获得列表,检查是否需要使用新投票的问题进行更新,并在需要时将其保存回数据库。

On another note - you could keep a list of the "top 10/100" trendy questions and you update them in "real-time" when a post is upvoted - you get the list, check if it needs to be updated with the newly upvoted question and save it back to the db if needed.

这篇关于如何构建DynamoDB数据库以允许查询趋势帖子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆