数据结构的优化存储，快速查找和持久性 [英] Optimal storage of data structure for fast lookup and persistence

查看：230 发布时间：2015/11/25 23:45:36 c# .net sql-server data-structures memory-mapped-files

本文介绍了数据结构的优化存储，快速查找和持久性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

场景

我有以下几种方法：

public void AddItemSecurity(int itemId, int[] userIds)
public int[] GetValidItemIds(int userId)

起初我在表格上想存储：

Initially I'm thinking storage on the form:

itemId -> userId, userId, userId

和

userId -> itemId, itemId, itemId

AddItemSecurity 是基于我如何从第三方的API获取数据， GetValidItemIds 是我要如何使用它在运行时。

AddItemSecurity is based on how I get data from a third party API, GetValidItemIds is how I want to use it at runtime.

有可能2000用户和1000万个项目。项目ID是在表单上：2007123456，2010001234（10位，其中前四重present年）

There are potentially 2000 users and 10 million items. Item id's are on the form: 2007123456, 2010001234 (10 digits where first four represent the year).

AddItemSecurity 没有进行超快速，但 GetValidIds 必须是秒级。此外，如果在更新现有的的itemId 我需要删除的itemId的用户没有在列表更长的时间。

AddItemSecurity does not have to perform super fast, but GetValidIds needs to be subsecond. Also, if there is an update on an existing itemId I need to remove that itemId for users no longer in the list.

我试图想我应该如何存储这些以最佳方式。 pferably $ P $磁盘（使用缓存），但我想在code可维护和清洁。

I'm trying to think about how I should store this in an optimal fashion. Preferably on disk (with caching), but I want the code maintainable and clean.

如果该项目的ID在0已经开始，我想到了创建一个字节数组的 MaxItemId / 8 每个用户的长度，并设置如果真/假位该项目是present与否。这将限制数组的长度为略高于每用户1MB，并给予快速的查询，以及一个简单的方法来更新每个用户的列表。通过坚持以此为<一href="http://msdn.microsoft.com/query/dev10.query?appId=Dev10IDEF1&l=EN-US&k=k%28SYSTEM.IO.MEMORYMAPPEDFILES.MEMORYMAPPEDFILE%29;k%28TargetFrameworkMoniker-%22.NETFRAMEWORK,VERSION%3dV4.0%22%29;k%28DevLang-CSHARP%29&rd=true"相对=nofollow>内存映射文件支持.Net 4框架我想我会得到不错的缓存，以及（如果机器有足够的RAM），而不执行缓存逻辑自己。解析ID，剥出的一年，存储每年的阵列可能是一个解决方案。

If the item id's had started at 0, I thought about creating a byte array the length of MaxItemId / 8 for each user, and set a true/false bit if the item was present or not. That would limit the array length to little over 1mb per user and give fast lookups as well as an easy way to update the list per user. By persisting this as Memory Mapped Files with the .Net 4 framework I think I would get decent caching as well (if the machine has enough RAM) without implementing caching logic myself. Parsing the id, stripping out the year, and store an array per year could be a solution.

条目标识 - >用户ID []列表中可以直接序列化到磁盘，并阅读，以持续名单/写与正常的FileStream **和diff它时，有修改。

The ItemId -> UserId[] list can be serialized directly to disk and read/write with a normal FileStream in order to persist the list and diff it when there are changes.

新用户时，所有的名单已更新的为好，但可以夜间进行。每次

Each time a new user is added all the lists have to updated as well, but this can be done nightly.

问题

我应该继续尝试这种方法，还是应探索以及有其他路径？我想SQL服务器将不会执行速度不够快，它会给出一个开销（至少如果它承载在不同的服务器上），但我的假设可能是错误的。对此事的任何思想或见解是AP preciated。我想尝试去解决它无需增加太多的硬件：）

Should I continue to try out this approach, or are there other paths which should be explored as well? I'm thinking SQL server will not perform fast enough, and it would give an overhead (at least if it's hosted on a different server), but my assumptions might be wrong. Any thought or insights on the matter is appreciated. And I want to try to solve it without adding too much hardware :)

[更新2010-03-31] 的

[Update 2010-03-31]

我现在已经有了SQL Server 2008在下列条件下进行测试。

I have now tested with SQL server 2008 under the following conditions.

在表格有两列（用户ID，项目ID）都是诠释
在两列聚集索引
新增〜800.000项目180用户 - 总共144万行
分配4GB内存对于SQL Server
双核2.66GHz的笔记本电脑
SSD硬盘
使用一个SqlDataReader读取所有itemid的年代进入一个List
在遍历所有用户

如果我运行在0.2秒一个线程是平均值。当我添加第二个线程它上升到0.4秒，这仍然是确定。从那里，结果会减少。增加第三螺纹带来很多查询的最多2 seonds。甲所述线程，最多4秒，第五尖峰一些查询达50秒。

If I run one thread it averages on 0.2 seconds. When I add a second thread it goes up to 0.4 seconds, which is still ok. From there on the results are decreasing. Adding a third thread brings alot of the queries up to 2 seonds. A forth thread, up to 4 seconds, a fifth spikes some of the queries up to 50 seconds.

该CPU是屋顶，而这是怎么回事，即使是在一个线程。我的测试应用程序需要一些因快速循环和SQL休息。

The CPU is roofing while this is going on, even on one thread. My test app takes some due to the speedy loop, and sql the rest.

这使我认为它不会很好地进行缩放的结论。在我的测试的硬件至少不会。是否有办法来优化数据库，说存储每个项目一个记录的INT的数组每个用户来代替。但是，这使得难以消除的项目。

Which leads me to the conclusion that it won't scale very well. At least not on my tested hardware. Are there ways to optimize the database, say storing an array of int's per user instead of one record per item. But this makes it harder to remove items.

[更新2010-03-31＃2] 的

[Update 2010-03-31 #2]

我做了一个快速测试使用相同的数据把它作为内存映射文件位。它执行好多了。六线程产生访问时间0.02秒和0.06S之间。纯粹的内存约束。该映射文件由一个进程被定位，并通过六个人同时访问。而作为SQL基地采取了4GB，磁盘上的文件了23MB。

I did a quick test with the same data putting it as bits in memory mapped files. It performs much better. Six threads yields access times between 0.02s and 0.06s. Purely memory bound. The mapped files were mapped by one process, and accessed by six others simultaneously. And as the sql base took 4gb, the files on disk took 23mb.

数据结构的优化存储，快速查找和持久性 [英] Optimal storage of data structure for fast lookup and persistence

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

数据结构的优化存储，快速查找和持久性 [英] Optimal storage of data structure for fast lookup and persistence

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭