如何设计NoSQL DB [英] How to design this NoSQL DB

查看:179
本文介绍了如何设计NoSQL DB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为自己创建一个使用DynamoDB的简单应用程序。我从未在高级级别上使用过NoSQL,只是在这里和那里存储值。



该应用程序是一个记录器。我将记录一些内容,而Dynamo将记录日期和日期。



例如,一个用户今天记录了很多东西,它只会说今天的日期和登录时间:5



我可以然后有一个查询来获取过去一周/天/月等内所有logging_times的总和。



我的问题是如何构造NoSQL数据库来做这样的东西有效吗?

解决方案

NOSQLdb的一些概念


  1. 写入应该在主键上平均分配。

  2. 读取应该在主键上平均分配。

在查看给定问题和dyanamodb模式时想到的显而易见的事情是



具有键日志作为主键,时间戳作为辅助键。并使用



select *,其中pk = logs和sk is_between x和y



,但这将违反这两个概念。我们总是在写单个pk,并且总是从同一个pk读取。



由于这个特殊的问题,
我们的PK应该足够随机(因此热键)并且具有足够的确定性(以便我们可以查询)



我们在设计密钥时必须对应用程序进行一些假设。假设我们决定每小时更新一次。因此可以将2018年1月7日作为密钥。其中17表示17小时。此密钥是确定性的,但不够随机。并且1月7日的每次更新或读取都将大部分移至同一分区。为了使密钥随机,我们可以使用像md5这样的哈希算法来计算它的哈希。假设经过哈希处理后,我们的密钥变为1sdc23sjdnsd。如果您正在查看表数据,这将毫无意义。但是,如果您想知道2018年1月7日的事件计数,您只需对时间进行哈希处理并使用hashkey从dynamodb中获取即可。
如果您想知道2018年1月7日的所有事件,则可以重复执行24次gets并汇总计数。



现在,这种模式将


  1. 如果您决定从每小时更改为分钟,则会出现问题。


  2. 如果您的大多数查询都是运行时,例如让我获取过去2,4,6天的所有数据。这将意味着到数据库的往返次数过多。而且,这将浪费时间和成本。


在明确定义查询模式后,请使用NOSQL 并存储由于性能原因导致的结果。如果您要对nosql进行联接或聚合查询,则将根据您的技术选择强制使用案例。



您还可以查看< a href = https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-time-series.html rel = nofollow noreferrer>建议存储时间序列数据的。 / p>

I am trying to create a simple application for myself that uses DynamoDB. I never used NoSQL on an advanced level, only to store a value here and there.

The application is a logger. I will log something and Dynamo will log the date and count for the day.

For example a user logs multiple things today it will just say todays date and logged_times: 5

I can then have a query to grab a total sum of all the logged_times within the past week / day / month etc.

My question is how do you structure a NoSQL database to do something like this that is efficient?

解决方案

Few concepts of NOSQLdb

  1. writes should be equally spread out on primary keys.
  2. read should be equally spread out on primary keys.

The obvious thing that comes to mind looking at given problem and dyanamodb schema is

have key logs as primary key and timestamp as secondary key. And to do an aggregation use

select * where pk=logs and sk is_between x and y

but this will violate both the concepts. We are always writing on a single pk and always reading from the same.

Now to this particular problem, Our PK should be random enough (so that no hot keys) and deterministic enough (so that we can query)

we will have to make some assumptions about application while designing keys. let's say we decide that we will update every hour. hence can have 7-jan-2018-17 as a key. where 17 means 17th hour. this Key is deterministic but it is not random enough. and every update or read on 7th jan will mostly be going to same partition. To make the key random we can calculate hash of it using hashing algo like md5. let's say after taking hash, our key becomes 1sdc23sjdnsd. This will not make any sense if you are looking at table data. But if you want to know the event count on 7-jan-2018-17 you just hash the time and do a get from dynamodb with the hashkey. if you want to know all the events on 7-jan-2018 you can do repeated 24 gets and aggregate the count.

Now this kind of schema will have issues where

  1. If you decide to change from hourly to minute basis.

  2. If most of your queries are run time like get me all the data for last 2,4,6 days. It will mean too many round trips to db. And it will be both time and cost inefficient.

Rule of thumb is when query patterns are well defined, use NOSQL and store the results for performance reasons. If you are trying to do a join or aggregation sort of queries on nosql, it is force fitting your use case based on your technology choice.

You can also looks at aws recommendation of storing time series data.

这篇关于如何设计NoSQL DB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆