任何一个优秀的执行贪婪集合覆盖的大型数据集? [英] Any good implementation of greedy set cover for large datasets?

查看:362
本文介绍了任何一个优秀的执行贪婪集合覆盖的大型数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题从我的一个相关的问题发表<一个如下href="http://stackoverflow.com/questions/7927787/finding-an-optimal-solution-that-minimizes-a-constraint">here. @mhum建议,我的问题落入的覆盖问题的域名。我试图编码我的问题到最小集合覆盖问题,目前我已经在这个形式的数据集:

This question follows from a related question of mine posted here. @mhum suggested that my problem falls into the covering problem domain. I tried encoding my question into a minimum set cover problem and currently I have a dataset in this form:

Set        Cost
(1,2)        1
(1)          1
(1,2,3)      2
(1)          2
(3,4)        2
(4)          3
(1,2)        3
(3,4)        4
(1,2,3,4)    4

我们的目标是找到一个很好的集合覆盖,覆盖所有的数字和一个试图最小化总成本。我的数据集是这样的(面积从5-40元不等的)大至少有30000套。是否有良好的贪婪的实现,以解决这个还是我对我自己来实现这一点?我不是在LP方面的专家,但任何LP-解算器(从numpy的/ SciPy的),可以解决这个也是可以接受的。

The objective is to find a good set cover that covers all numbers and one that attempts to minimize the total cost. My dataset is big with at least 30000 sets (of size varying from 5-40 elements) like this. Are there any good greedy implementations to solve this or am I on my own to implement this? I am not an expert in LP but any LP-solvers (from numpy/scipy) that can solve this are also acceptable.

推荐答案

有一个著名的贪婪近似算法集合覆盖,这也是很容易在你选择的任何一种语言来实现。算法本身此处描述:

There is a well-known greedy approximation algorithm for set cover that is also easy to implement in whatever language of your choice. The algorithm itself is described here:

http://en.wikipedia.org/wiki/Set_cover_problem#Greedy_algorithm

这是如此简单,最容易的事就是要从头开始写。

It is so simple that the easiest thing is just to write it from scratch.

值得注意的是,它也被称为集合覆盖最好的多项式时间近似算法。这意味着,以获得更好的最坏情况下的性能(更紧凑的结果集),你需要有(大集=慢算法)非多项式的运行时间。

Notably, it is also the best polynomial-time approximation algorithm known for set cover. That means that to get better worst-case performance (more compact result set) you would need to have non-polynomial running times (= slow algorithms for large sets).

不幸的Wikipedia条目实际上不包括加权集合覆盖,这是这里的情况。扩展是简单,且例如描述这里:

Unfortunately the Wikipedia entry doesn't actually cover weighted set cover, which is the case here. The extension is simple, and is described e.g. here:

HTTP://pages.cs .wisc.edu /〜术赤/场/ 880-S07 /划线笔记/ lecture03.pdf

一些有用的注意事项:

http://www.cs.ucr.edu/~neal/ non_arxiv / Young08SetCover.pdf http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec20.pdf

http://www.cs.ucr.edu/~neal/non_arxiv/Young08SetCover.pdf http://www.cs.uiuc.edu/class/sp08/cs473/Lectures/lec20.pdf

这篇关于任何一个优秀的执行贪婪集合覆盖的大型数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆