任何一个优秀的执行贪婪集合覆盖的大型数据集? [英] Any good implementation of greedy set cover for large datasets?
问题描述
这个问题从我的一个相关的问题发表<一个如下href="http://stackoverflow.com/questions/7927787/finding-an-optimal-solution-that-minimizes-a-constraint">here. @mhum建议,我的问题落入的覆盖问题的域名。我试图编码我的问题到最小集合覆盖问题,目前我已经在这个形式的数据集:
This question follows from a related question of mine posted here. @mhum suggested that my problem falls into the covering problem domain. I tried encoding my question into a minimum set cover problem and currently I have a dataset in this form:
Set Cost
(1,2) 1
(1) 1
(1,2,3) 2
(1) 2
(3,4) 2
(4) 3
(1,2) 3
(3,4) 4
(1,2,3,4) 4
我们的目标是找到一个很好的集合覆盖,覆盖所有的数字和一个试图最小化总成本。我的数据集是这样的(面积从5-40元不等的)大至少有30000套。是否有良好的贪婪的实现,以解决这个还是我对我自己来实现这一点?我不是在LP方面的专家,但任何LP-解算器(从numpy的/ SciPy的),可以解决这个也是可以接受的。
The objective is to find a good set cover that covers all numbers and one that attempts to minimize the total cost. My dataset is big with at least 30000 sets (of size varying from 5-40 elements) like this. Are there any good greedy implementations to solve this or am I on my own to implement this? I am not an expert in LP but any LP-solvers (from numpy/scipy) that can solve this are also acceptable.
推荐答案
有一个著名的贪婪近似算法集合覆盖,这也是很容易在你选择的任何一种语言来实现。算法本身此处描述:
There is a well-known greedy approximation algorithm for set cover that is also easy to implement in whatever language of your choice. The algorithm itself is described here:
http://en.wikipedia.org/wiki/Set_cover_problem#Greedy_algorithm
这是如此简单,最容易的事就是要从头开始写。
It is so simple that the easiest thing is just to write it from scratch.
值得注意的是,它也被称为集合覆盖最好的多项式时间近似算法。这意味着,以获得更好的最坏情况下的性能(更紧凑的结果集),你需要有(大集=慢算法)非多项式的运行时间。
Notably, it is also the best polynomial-time approximation algorithm known for set cover. That means that to get better worst-case performance (more compact result set) you would need to have non-polynomial running times (= slow algorithms for large sets).
不幸的Wikipedia条目实际上不包括加权集合覆盖,这是这里的情况。扩展是简单,且例如描述这里:
Unfortunately the Wikipedia entry doesn't actually cover weighted set cover, which is the case here. The extension is simple, and is described e.g. here:
HTTP://pages.cs .wisc.edu /〜术赤/场/ 880-S07 /划线笔记/ lecture03.pdf
一些有用的注意事项:
http://www.cs.ucr.edu/~neal/ non_arxiv / Young08SetCover.pdf http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec20.pdf
http://www.cs.ucr.edu/~neal/non_arxiv/Young08SetCover.pdf http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec20.pdf
这篇关于任何一个优秀的执行贪婪集合覆盖的大型数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!