保存一个特技到磁盘 [英] Saving a trie to disk

查看:114
本文介绍了保存一个特技到磁盘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这听起来像一个简单的问题,但我不知道如何搜索其答案。

This sounds like a simple question, but I don't know how to search for its answer.

我在C#中有一个trie实现,将存储大约80K来自字典文件的单词。加载所有这些单词需要相当长的时间(超过5分钟)。我想知道,持久这些数据的最佳方法是什么,所以我每次启动应用程序时都不需要重新加载所有单词?

I have a trie implementation in C# that will store about 80K words from a dictionary file. It takes quite a while to load all these words (more than 5 mins). I was wondering, what is the best way to "persist" those data so I don't have to reload all words every time I start the application?

谢谢。

推荐答案

像所有其他性能问题一样,理想的解决方案将遵循从您提出的当前解决方案和其他候选解决方案。瓶颈在哪里? I / O?列出文字?形成链接中的链接?不了解您的表现目标,目前使用的线索使用和瓶颈的性质,将难以制定具体的建议。

Like all other performance issues, the ideal solution will follow from profiling your current solution and other candidate solutions that you come up with. Where's the bottleneck? The I/O? Lexing the text? Forming the links in the trie? Will be hard to make a concrete suggestion without knowing your performance goals, the nature of the trie-usage and bottlenecks currently present.

考虑:


  1. 存储格式:文本?二进制?

  2. 持久化的数据:整个结构的trie(如XML)或只是一个单词列表,依靠运行时代码将它们推入数据中的正确位置-结构体?数据比率的标记是什么?解析有多重?

  3. 存储位置:DB / flat-file / ...?

  4. 增量加载:可能? >
  1. Storage format: Text? Binary?
  2. Persisted data: The entire structure of the trie (e.g. as XML) or just a list of words, relying on run-time code to push them into the right location in the data-structure? What's the markup to data ratio? How heavy is it to parse?
  3. Storage location: DB / flat-file / ...?
  4. Incremental loading: Possible?

一种可能的策略:使用最常用的单词的1,000(或更多)创建并保留最常用词字典。在启动时将这些单词加载到trie中,并在另一个线程上生成全字典的加载;读取新的单词时逐渐添加到创建的特里。

One possible strategy: Create and persist a 'most common words' dictionary with the 1,000 (or so) of the most frequently-used words. Load these words into the trie on start-up, and spawn the loading of the full-dictionary on another thread; incrementally adding to the created trie as new words are read.


  • 优点:用户可以看到更快的启动时间。

  • 缺点:线程
    同步,用户将看到
    不完整的trie,直到加载
    完全完成。这可能是也可能不是一个showstopper取决于使用什么trie。

这篇关于保存一个特技到磁盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆