具有多个标签的DynamoDB查询 [英] DynamoDB Query with multiple tags
问题描述
我对DynamoDB并不陌生,目前我们正在考虑使用DynamoDB将现有项目迁移到无服务器应用程序,我们希望从RDMS数据库中适应以下设置:
I am rather new to DynamoDB and currently we are thinking about migrating an existing project to a serverless application using DynamoDB where we want to adapt the following setup from a RDMS database:
表:
- 项目( ProjectID )
- 文件(文件ID ,项目ID ,文件名)
- 标签(文件ID ,标签)
- Projects (ProjectID)
- Files (FileID, ProjectID, Filename)
- Tags (FileID, Tag)
我们要使用DynamoDB进行查询,以获取特定项目的所有文件 ProjectID),其中包含一个或多个标签(按标签)。在RDMS中,此查询将很简单,例如:
We want to make a query with DynamoDB to fetch all Files for a specific Project (by ProjectID) with one or multiple Tags (by Tag). In an RDMS this query would be simple with something like:
SELECT * FROM Files JOIN标签上Tags.FileID = Files.FileID WHERE文件。 ProjectID =?PROJECT AND Tags.Tag =?TAG_1或?TAG_2 ...
目前,我们具有以下DynamoDB设置(但仍然可以更改):
At the moment, we have the following DynamoDB setup (but it can still be changed):
- 项目(ProjectID [HashKey],...)
- 文件(ProjectID [HashKey],FileID [RangeKey],...)
请同时考虑项目条目数是巨大的(介于1000-30000之间),而且每个项目的文件数量(介于50和100.000之间),查询应该非常快。
Please also consider that the number of project entries is huge (between 1000 - 30000) and also the number of files for each project (is between 50 and 100.000) and the query should be really fast.
这怎么办使用DynamoDB查询来实现,最好不使用过滤器表达式,因为它们是在数据选择之后应用的?如果表文件可以具有StringSet标签作为列,那将是完美的,但是我想这不能用于有效的DynamoDB查询(因此,如果不使用DynamoDB-scan),因为DynamoDB索引只能是String,Binary和Number类型,而不是StringSet类型?这可能是全球二级索引的适用用例吗? (GSI)?
How can this be achieved using DynamoDB-query, best without using filter expressions since they are applied after data selection? It would be perfect if the table Files could have a StringSet Tags as column but I guess that this cannot be used for an efficient DynamoDB-query (so without using DynamoDB-scan) since DynamoDB-indices can only be of type String, Binary and Number and not of type StringSet? Is this maybe an applicable use case for the Global Secondary Index (GSI)?
推荐答案
有点晚了,刚刚看到这个问题是从另一个问题引用的。
A bit late, just saw this question referenced from another one.
我想您已经解决了这个问题?
I guess you've went and solved it something like this?
DynamoDB表
- 项目(ProjectID [HashKey],...)
- 文件(ProjectID [HashKey],FileID [RangeKey],...)
- 标签(标签[HashKey],FileID [RangeKey],ProjectID [LSI Sort Key])
在FileTag上,您需要FileID来使主键唯一,但是您可以将ProjectID添加为本地二级索引的排序键,以便您可以搜索Tag + ProjectID。
On the FileTags, you need the FileID to make the primary key unique, but you can add the ProjectID as a sort key for a Local Secondary Index, so you can search on Tag + ProjectID.
这是某种数据非规范化,但这就是NoSQL所需要的:-(。。例如,如果您的文件将切换到另一个项目,则需要更新不仅要在文件上,还要在所有标签上都使用ProjectID。
It's some sort of Data Denormalization, but that's what it takes to go NoSQL :-( . E.g. if your File would be switched to another Project, you'll need to update the ProjectID not only on the File, but also on all the Tags.
这篇关于具有多个标签的DynamoDB查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!