高级数据结构在实践中 [英] Advanced data structures in practice

查看:121
本文介绍了高级数据结构在实践中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在编程十年中,我可以计算一下我使用的数据结构数量:数组,链表(我使用这个堆栈和排队)和字典。这并不是真的令人惊讶,因为几乎所有的应用程序都写入了数据/ CRUD类别。



我从来没有需要使用红黑树,跳过列表,双端队列,循环链接列表,优先级队列,堆,图形或过去50年来研究的数十种异国数据结构中的任何一种。我觉得我错过了。



这是一个开放式的问题,但在实践中使用的这些异国情调数据结构在哪里?有谁有任何现实世界的经验,使用这些数据结构来解决一个特定的问题?

解决方案

一些例子。他们是模糊的,因为他们为雇主工作:




  • A ,以获得Google样式搜索中的前N个结果。 (从索引中的候选人开始,直接遍历它们,通过最大大小N的最小堆筛选它们)。这是一个图像搜索原型。


  • 布鲁姆过滤器减少了数百万用户已经看到的某些数据的大小一个适合现有服务器的数量(它们都必须以RAM为速度);原始设计将需要用于该数据库的许多新服务器。


  • A 三角阵列表示将推荐引擎的密集对称数组的大小减半(由于相同的原因,再次为RAM)。


  • 用户必须根据某些关联进行分组; union-find 使这个简单,快速,准确而不是缓慢,黑客和大概。


  • 一个应用程序根据驾驶时间选择零售网站,这个应用程序用于附近的人们使用 Dijkstra最短路径。其他GIS工作利用了四叉树 Morton 索引。




了解数据结构中的内容 - 土地派上用场 - 实验室中的几周可以节省您在图书馆中的时间。盛大过滤器的案例只是因为规模而值得的:如果问题出现在创业公司而不是雅虎,我会使用一个普通的旧的哈希表。我认为其他例子在任何地方都是合理的(虽然现在你不太可能自己编写代码)。


In the 10 years I've been programming, I can count the number of data structures I've used on one hand: arrays, linked lists (I'm lumping stacks and queues in with this), and dictionaries. This isn't really surprising given that nearly all of the applications I've written fall into the forms-over-data / CRUD category.

I've never needed to use a red-black tree, skip list, double-ended queue, circularly linked list, priority queue, heaps, graphs, or any of the dozens of exotic data structures that have been researched in the past 50 years. I feel like I'm missing out.

This is an open-ended question, but where are these "exotic" data structures used in practice? Does anyone have any real-world experience using these data structures to solve a particular problem?

解决方案

Some examples. They're vague because they were work for employers:

  • A heap to get the top N results in a Google-style search. (Starting from candidates in an index, go through them all linearly, sifting them through a min-heap of max size N.) This was for an image-search prototype.

  • Bloom filters cut the size of certain data about what millions of users had seen down to an amount that'd fit in existing servers (it all had to be in RAM for speed); the original design would have needed many new servers just for that database.

  • A triangular array representation halved the size of a dense symmetrical array for a recommendation engine (RAM again for the same reason).

  • Users had to be grouped according to certain associations; union-find made this easy, quick, and exact instead of slow, hacky, and approximate.

  • An app for choosing retail sites according to drive time for people in the neighborhood used Dijkstra shortest-path with priority queues. Other GIS work took advantage of quadtrees and Morton indexes.

Knowing what's out there in data-structures-land comes in handy -- "weeks in the lab can save you hours in the library". The bloom-filter case was only worthwhile because of the scale: if the problem had come up at a startup instead of Yahoo, I'd have used a plain old hashtable. The other examples I think are reasonable anywhere (though nowadays you're less likely to code them yourself).

这篇关于高级数据结构在实践中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆