如何为 Twitter 文章聚合器设计 MongoDB 模式 [英] How do I design a MongoDB schema for a Twitter article aggregator

查看:59
本文介绍了如何为 Twitter 文章聚合器设计 MongoDB 模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 MongoDB 的新手,作为练习,我正在构建一个从推文中提取链接的应用程序.这个想法是为一个主题获得最多的推文文章.我很难为这个应用程序设计架构.

I'm new to MongoDB and as an exercise I'm building an application that extracts links from tweets. The idea is to get the most tweeted articles for a subject. I having a hard time to design the schema for this application.

  • 应用程序收集推文并保存它们
  • 推文被解析为链接
  • 链接与附加信息(标题、摘录等)一起保存
  • 一条推文可以包含多个链接
  • 一个链接可以有很多推文

我该怎么做:

  • 保存这些集合,嵌入文档?
  • 获取按推文数量排序的前十个链接?
  • 获取特定日期推文最多的链接?
  • 获取推文的链接?
  • 获取 10 条最新推文?

我很想就此获得一些意见.

I would love to get some input on this.

推荐答案

两个一般提示:1.) 不要害怕重复.在不同的集合中以不同的格式存储相同的数据通常是一个好主意.

two general tips: 1.)don't be afraid to duplicate. It is often a good idea to store the same data differently formatted in different collections.

2.) 如果你想对东西进行排序和总结,保持计数字段无处不在是有帮助的.mongodb 的原子更新方法与 upsert 命令相结合,可以轻松地进行计数并将字段添加到现有文档中.

2.) if you want to sort and sum up stuff, it helps to keep count fields everywhere. mongodb's atomic update method together with upsert commands make it easy to count up and to add fields to existing documents.

以下内容肯定有缺陷,因为它是从我的头顶输入的.但坏例子总比我想的没有例子好 ;)

The following is most certainly flawed because it's typed from the top of my head. But better bad examples than no examples I thought ;)

colletion tweets:

{
  tweetid: 123,
  timeTweeted: 123123234,  //exact time in milliseconds
  dayInMillis: 123412343,  //the day of the tweet kl 00:00:00
  text: 'a tweet with a http://lin.k and an http://u.rl',
  links: [
     'http://lin.k',
     'http://u.rl' 
  ],
  linkCount: 2
}

collection links: 

{
   url: 'http://lin.k'
   totalCount: 17,
   daycounts: {
      1232345543354: 5, //key: the day of the tweet kl 00:00:00
      1234123423442: 2,
      1234354534535: 10
   }
}

添加新推文:

db.x.tweets.insert({...}) //simply insert new document with all fields

//for each found link:
var upsert = true;
var toFind =  { url: '...'};
var updateObj = {'$inc': {'totalCount': 1, 'daycounts.12342342': 1 } }; //12342342 is the day of the tweet
db.x.links.update(toFind, updateObj, upsert);

获取按推文数量排序的前十个链接?

Get the top ten links sorted by number of tweets they have?

db.x.links.find().sort({'totalCount:-1'}).limit(10);

获取特定日期推文最多的链接?

Get the most tweeted link for a specific date?

db.x.links.find({'$gt':{'daycount.123413453':0}}).sort({'daycount.123413453':-1}).limit(1); //123413453 is the day you're after

获取推文的链接?

db.x.tweets.find({'links': 'http://lin.k'});

获取十条最新推文?

db.x.tweets.find().sort({'timeTweeted': -1}, -1).limit(10);

这篇关于如何为 Twitter 文章聚合器设计 MongoDB 模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆