使用动态编程对列表进行分区 [英] partition of a list using dynamic programming

查看:93
本文介绍了使用动态编程对列表进行分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里发布了一些与我一直在努力的项目相关的信息,我一直遇到设计问题,必须从头开始设计。因此,我想知道我是否可以发布我想做的事情,有人可以帮助我了解如何获得所需的结果。



BackGround :



我是编程和尝试学习的新手。因此,我参加了一个令我感兴趣的项目,该项目基本上涉及列表并仅使用列表中的数字分解每个数字。我知道我可以很轻松地做到这一点(我做到了),但是我也想学习Hbase,Hadoop和并行处理,因此我想以一种可以跨多台机器打破这一过程的方式来做到这一点。我认为实现此目标的方法是使用动态编程和递归来创建可能进一步细分的可能性表。



示例:



如果我提交列表: [1,2,4] 我应该得到 {1:[[1]],2:[[1,1]],4:[[2,2]]} 。它的基本意思是2 + 2 = 4、1 + 1 = 2和1 = 1 ..因此,尝试查看制作4的所有方法,我只需查找此列表(将在数据库中),然后看到2 + 2 = 4,然后分解2 ..依此类推。.我有查找工作,但发生故障时出现问题。使用蛮力不会让我以可以使用hadoop或其他工具扩展规模的方式使用大量数字(例如,一百万,列表中有一千个数字)。以下是一些可能的结果示例:

  [1,2,3,4] = {1:[[1] ],2:2 [[1,1]],3:3 [[1,2]],4:[[1,3],[2,2]]} 
[1,2,3,4 ,6,10] = {1:[[1]],2:2 [[1,1]],3:[[1,2]],4:[[1,3],[2,2]] ,6:[[2,4],[3,3]],10:[[4,6],[2,2,2,2,2]]}
[10,30,50] = 50:[[10,10,30]],30:[[10,10,10]]}

采用这种方法的逻辑是,将不需要花费时间来计算列表中的下一个可能的数据,因此,如果我发送具有一百万个数字的列表,它可以快速完成,甚至可以扩展到hadoop集群。 / p>

我创建的使此代码工作的代码是此处,但问题是如何纠正设计问题。我得到了有关这是分区问题的建议,环顾四周,发现我尝试做的事情要简单得多( activestate ),但它与我要执行的操作不完全相同,因为它会分解数字并且不使用特定的数据集来执行此操作。



问题:



所以希望我清楚地描述了我要做什么。我需要阅读/研究/学习使用动态编程在python中创建列表分区表的内容,以便它可以扩展?它只是一种业余爱好,并且对时间不敏感,但是我觉得我已经从事了3个月以上,每次遇到设计问题时,我都必须从头开始。我该如何正确构建它(要查看我的错误方法,请参见上面的链接)?我已经在Google上搜索并找到了背包问题和分区问题的解决方案,但它们似乎更多用于学校作业,并且不能真正构建为可与大型数据集一起扩展。



希望有人能给我提供见识,但无论谢谢您阅读此书。

解决方案

您必须知道,DP问题本身并不是独立和分布式计算的最佳选择。



当您考虑经典形式的DP算法时,您会发现有一个矩阵/表/数组,您就可以按一定顺序依次计算新值。值的每次计算都需要先创建其他值。因此,您将失去数据独立性,并且可以根据特定的DP算法最大程度地同时计算一定数量的数组字段。例如,许多DP算法可以并行处理整个表列,因为每个字段都依赖于前一列中的字段。但是,由于该列之后所有其余字段的数据相关性,因此这已经是限制。



话虽如此,计算列表中可用数字的总和就是不是DP问题。您根本不解决任何子问题,而只是收集所有可能的总和(如果它们恰好与您的列表项之一匹配)。



因此,我建议以下是一种明显不同的方法:




  • 使用所有可能的总和计算新列表。这是独立于数据的,可以并行化,但是您需要一些上限来终止。示例: [1,2,4] 变为 [[1],[2],[4],[1,1],[1 ,2],[1,4],...] 。您不必显式构造此列表,只需将每个这样的组合传递到下一步。

  • 评估每个计算,即创建总和并检查它是否与来自您的原始清单。同样,您是独立于数据的,并且可以独立执行所有这些计算。

  • 将肯定结果组合到最终数据结构中。



因此,总结一下并回答您的问题:




  • 重新考虑一下,是否要将此问题视为DP

  • 您可能想了解数据并行性。在使用GPU解决此类问题时,这一点尤其重要,因此有关CUDA / OpenCL的相应文献也可能有用。


I have posted a bit here related to a project I have been trying to work on and I keep hitting design problems and have to design from scratch. So I'm wondering if I can post what I'm trying to do and someone can help me understand how I can get the result I want.

BackGround:

I'm new to programming and trying to learn. So I took a project that interested me which involves basically taking list and breaking down each number using only numbers from the list. I know I could easily brute force this(which I did) but I wanted to also learn Hbase, Hadoop, and parallel processing so I wanted do it in a way that I could break the process across various machines. I thought the way to do this was using dynamic programming and recursions to create a table of possibilities that could be broken down even further.

Example:

If I submit the list: [1,2, 4] I should get {1: [[1]], 2: [[1, 1]], 4: [[2, 2]]}. What its basically saying is 2+2=4, 1+1=2, and 1=1..so trying to see all the ways to make 4, I can just look up this list(which will be in a database) and see 2+2=4, then break down 2..and so on.. I have the lookup job working but the breakdown I am having problems with. Using brute force will not let me use large numbers(say a million, with a thousand numbers in the list) in a way that I can use hadoop or some other tool to scale it. Here are some more examples of possible results:

[1,2, 3, 4] = {1: [[1]], 2: [[1, 1]], 3: [[1, 2]], 4: [[1, 3], [2, 2]]}
[1,2, 3, 4, 6, 10] = {1: [[1]], 2: [[1, 1]], 3: [[1, 2]], 4: [[1, 3], [2, 2]], 6: [[2, 4], [3, 3]], 10: [[4, 6], [2, 2, 2, 2, 2]]}
[10, 30, 50] = 50: [[10, 10, 30]], 30: [[10, 10, 10]]}

The logic with this approach is that it will not take time to comput the next possible data in the list, so if I send a list with a million numbers it could do it quickly and even scale to a hadoop cluster.

The code I created to get this to work is here but that question was on how to correct the design problem. I got advice that this is a partition problem and looked around and found much simpler versions of what I was trying to do ( activestate ), but its not exactly what I am trying to do because it breaks down a number and does not use a specific data set to do it.

Question:

So hopefully I clearly described what I am trying to do. What do I need to read/study/learn to create a table of a partition of an list in python using dynamic programming so it can scale? Its just a hobby and not time sensitive but I feel I have been working on this for over 3 months and each time I hit design problems that cause me to have to start from scratch. How can I build this correctly(to see my incorrect way see my link above)? I have googled and found solutions to the knapsack problem and partition problems but they seem to be more for school work and not really built to scale with large datasets.

Hopefully someone can give me insight but regardless thank you for reading this.

解决方案

You have to be aware, that DP problems are not per se optimal for independent and distributed computations.

When you consider the classical forms of DP algorithms, you have a matrix/table/array and you successively compute new values in a certain order. Each computation of a value requires other values that have to be created before. Hence, you lose data independence and can maximally compute a certain number of array fields at the same time, depending on the specific DP algorithms. For example, many DP algorithms can process a whole table column in parallel, as each field relies on fields from a previous column. But that's already the limit due to the data dependency of all remaining fields after that column.

That being said, calculating the sum possibilities of the various numbers available in your list is NOT a DP problem. You do not solve any sub-problems at all, but simply collect all possible sums (if they happen to match one of your list entries).

Hence, I suggest the following sightly different approach:

  • Compute a new list with all possible summations. This is data independent and can be parallelized, but you need some upper bound for termination. Example: [1,2,4] becomes [ [1],[2],[4],[1,1],[1,2],[1,4],...]. You don't have to explicitly construct this list, but just pass each such combination on to the next step.
  • Evaluate each computation, i.e. create the sum and check whether it matches a value from your original list. Again, you are data independent and can perform all these calculations independently.
  • Combine the positive results into the final datastructure.

So to sum it up and answer your questions:

  • Rethink, whether you want to regard this problem as DP at all.
  • You may want to read up on data-parallelism. This is particularly relevant when solving such problems with GPUs, so the corresponding literature on CUDA/OpenCL may be useful, too.

这篇关于使用动态编程对列表进行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆