如何在MongoDB中实现这个架构? [英] How should I implement this schema in MongoDB?
问题描述
在MySQL中,我将创建一个表格看起来类似于
用户:
username_name:string
广告系列:
title:string
description:string
link:string
UserCampaign:
user_id:integer
camp_id:integer
点击:
os:text
referer:text
camp_id:integer
user_id:integer
我需要能够:
- 查看IP,Referer,OS等每次点击的信息,等等
- 查看X IP,X Referer,X OS
- 将每个点击与用户和广告系列/ li>
如果我按照
用户{
广告系列:[
{
点击:[]
}
]
}
我遇到两个问题:
- 它创建每个用户的一个新的活动对象是一个问题,因为如果我需要更新我的广告系列,我需要更新每个用户的对象
- 我希望Clicks数组包含一个LARGE数据量,我觉得拥有该用户对象的一部分将使查询非常慢[/ li>
好的,我想你需要把它分解成基本的品种。
你有两个实体样式的对象:
-
用户
-
广告系列
您有一个映射样式对象:
-
UserCampaign
您有一个交易样式的对象:
-
点击
步骤1:实体
一个:用户
& 广告系列
。这些是真正的两个独立的对象,没有一个真正依赖于另一个为它的存在。两者之间也没有隐含的历史:用户不属于广告系列,广告系列也不属于用户。
当您有两个顶级对象时,他们一般赚取自己的收藏。所以你需要一个用户
集合和一个 Camapaigns
集合。
步骤2:映射
UserCampaign
目前用于表示N到M映射。现在,一般来说,当你有一个N到1的映射,你可以把N放在1的内部。然而,使用N到M映射,你通常必须选择一边。
理论上可以执行以下操作之一:
- 列出
每个
内的广告系列ID
用户 - 列出
每个
广告系列内的用户ID
个人来说,我会做#1。你可能有更多的用户使用这个广告系列,你可能想把数组放在更短的位置。
步骤3:事务性 / p>
点击确实是一个完全不同的野兽。在客观条款中,您可以考虑以下内容:点击
属于code>用户,点击次数
属于一个广告系列
。所以理论上你只能存储点击是这些对象之一。很容易认为Clicks属于 用户或广告系列。
但是,如果你真的深入挖掘,上述简化是非常有缺陷的。在您的系统中,点击
确实是一个中心对象。事实上,你甚至可以说,用户&广告系列真的只是与点击相关联。
查看您要求的问题/查询。所有这些问题实际上围绕点击。 用户&广告系列不是数据中的中心对象,点击是。
此外,点击量将是您系统中最丰富的数据。您将获得更多的点击比其他任何东西。
这是为这样的数据设计架构时最大的困扰。有时候,当您不需要重要的东西时,您需要推送父对象。想象一下,建立一个简单的电子商务系统。很明显,订单
将属于用户
,但订单
对系统而言至关重要,它将成为一个顶级对象。
包装起来
您可能需要三个集合:
- 用户 - >具有campaign._id的列表
- 广告系列
- 点击次数 - >包含user._id,campaign._id
这应该满足您的所有查询需求:
查看IP,Referer,OS等每次点击的信息,等等
db.clicks.find()
查看X IP,X Referer,X OS中有多少次点击来源
db.clicks.group()
或运行 Map-Reduce 。
将每个点击与用户和广告系列
db.clicks.find({user_id:blah})
还可以将点击ID推送给用户和广告系列(如果有意义)。
请注意,如果您有很多点击次数,真的必须分析你最经常运行的查询。您不能在每个字段上建立索引,因此您经常希望运行Map-Reduction来汇总这些查询的数据。
I'm trying to write a tracking script and I'm having trouble with figuring out how the database should work.
In MySQL I'd create a table that looks similar to
User:
username_name: string
Campaign:
title: string
description: string
link: string
UserCampaign:
user_id: integer
camp_id: integer
Click:
os: text
referer: text
camp_id: integer
user_id: integer
I need to be able to:
- See the information from each click like IP, Referer, OS, etc
- See how many often clicks are coming from X IP, X Referer, X OS
- Associate each click with a User and a Campaign
If I do something along the lines of
User {
Campaigns: [
{
Clicks: []
}
]
}
I run into two problems:
- It creates a new campaign object for each user which is a problem because if I need to update my campaign I'd need to update the object for each user
- I expect the Clicks array to contain a LARGE amount of data, I feel like having it a part of the User object will make it very slow to query
OK, I think you need to break this out into the basic "varieties".
You have two "entity"-style objects:
User
Campaign
You have one "mapping"-style object:
UserCampaign
You have one "transactional"-style object:
Click
Step 1: entity
Let's start with the easy ones: User
& Campaign
. These are truly two separate objects, neither one really depends on the other for its existence. There's also no implicit heirarchy between the two: Users do not belong to Campaigns, nor do Campaigns belong to Users.
When you have two top-level objects like this, they generally earn their own collection. So you'll want a Users
collection and a Camapaigns
collection.
Step 2: mapping
UserCampaign
is currently used to represent an N-to-M mapping. Now, in general, when you have an N-to-1 mapping, you can put the N inside of the 1. However, with the N-to-M mapping, you generally have to "pick a side".
In theory, you could do one of the following:
- Put a list of
Campaign ID
s inside of eachUser
- Put a list of
Users ID
s inside of eachCampaign
Personally, I would do #1. You probably have way more users that campaigns, and you probably want to put the array where it will be shorter.
Step 3: transactional
Clicks is really a completely different beast. In object terms you could think the following: Clicks
"belong to" a User
, Clicks
"belong to" a Campaign
. So, in theory, you could just store clicks are part of either of these objects. It's easy to think that Clicks belong under Users or Campaigns.
But if you really dig deeper, the above simplification is really flawed. In your system, Clicks
are really a central object. In fact, you might even be able to say that Users & Campaigns are really just "associated with" the click.
Take a look at the questions / queries that you're asking. All of those questions actually center around clicks. Users & Campaigns are not the central object in your data, Clicks are.
Additionally, Clicks are going to be the most plentiful data in your system. You're going to have way more clicks than anything else.
This is the biggest hitch when designing a schema for data like this. Sometimes you need to push off "parent" objects when they're not the most important thing. Imagine building a simple e-commerce system. It's clear that orders
would "belong to" users
, but orders
is so central to the system that it's going to be a "top-level" object.
Wrapping it up
You'll probably want three collections:
- User -> has list of campaign._id
- Campaign
- Clicks -> contains user._id, campaign._id
This should satisfy all of your query needs:
See the information from each click like IP, Referer, OS, etc
db.clicks.find()
See how many often clicks are coming from X IP, X Referer, X OS
db.clicks.group()
or run a Map-Reduce.
Associate each click with a User and a Campaign
db.clicks.find({user_id : blah})
It's also possible to push click IDs into both users and campaigns (if that makes sense).
Please note that if you have lots and lots of clicks, you'll really have to analyze the queries you run most. You can't index on every field, so you'll often want to run Map-Reduces to "roll-up" the data for these queries.
这篇关于如何在MongoDB中实现这个架构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!