算法从两个列表中选择一个最佳组合 [英] Algorithm to select a best combination from two list

查看:188
本文介绍了算法从两个列表中选择一个最佳组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个办法航班搜索结果。因此,有两个列表包含航班出发和到达的航班,如:

I have a search result from two way flight. So, there are two lists that contain the departure flights and arrival flights such as:

  • 在离境航班列表中有20个航班。
  • 的到来,航班列表有航班30

所以,我将有600(20 * 30)起飞航班和抵达航班之间的结合。我会打电话给组合列表是在结果列表

So, I will have 600 (20*30) combination between departure flight and arrival flight. I will call the combination list is the result list

不过,我只是想从600组合选择的限制。举例来说,我会选择最好的100航班组合。要结合飞行的标准是价格便宜的出发和到达的航班。

However, I just want to select a limitation from 600 combination. For instance, I will select the best of 100 flight combination. The criteria to combine the flights is the cheap price for departure and arrival flight.

要做到这一点,我会排序的结果列表由出发和到达航班的总价格。我再拿起从结果列表第100个元素来得到我想要的。

To do that, I will sort the result list by the total price of departure and arrival flight. And I then pick up the first 100 elements from result list to get what I want.

但是,如果出发航班列表中有200航班和抵达航班列表中有300个航班,我会在结果列表与60.000元。出于这个原因,我会用排序60.000元素的列表找到最好的100个元素。

But, if the departure flights list has 200 flights and arrival flights list has 300 flights, I will have the result list with 60.000 elements. For that reason, I will sort a list with 60.000 elements to find the best 100 elements.

所以,有任何一种算法来选择最佳的组合,我的情况。

So, there is any an algorithm to select the best combinations as my case.

感谢你了。

推荐答案

不是100%清楚的从你的问题,但我明白,你正在寻找一个更快的算法找出一定数量的出发和到达的最佳/最便宜的组合航班。

Not 100% clear from your question, but I understand that you are looking for a faster algorithm to find a certain number of best / cheapest combinations of departure and arrival flights.

您可以做到这一点的按分类单独出发和到达航班由成本清单,然后使用的的=htt​​p://en.wikipedia.org/wiki/Heap_data_structure 相对=nofollow>堆扩大下一个最佳组合,一个接一个,直到你有足够的。

You can do this much faster by sorting the lists of departure and arrival flights individually by cost and then using a heap for expanding the next-best combinations one-by-one until you have enough.

下面是完整的算法 - 在Python中,但没有使用任何特殊的库,只是标准的数据结构,所以这应该是很容易转移到任何其他语言:

Here's the full algorithm -- in Python, but without using any special libraries, just standard data structures, so this should be easily transferable to any other language:

NUM_FLIGHTS, NUM_BEST = 1000, 100

# create test data: each entry corresponds to just the cost of one flight
from random import randint
dep = sorted([randint(1, 100) for i in range(NUM_FLIGHTS)])
arr = sorted([randint(1, 100) for i in range(NUM_FLIGHTS)])
def is_compatible(i, j): # for checking constraints, e.g. timing of flights
    return True          # but for now, assume no constraints

# get best combination using sorted lists and heap
from heapq import heappush, heappop
heap = [(dep[0] + arr[0], 0, 0)]   # initial: best combination from dep and arr
result = []                        # the result list
visited = set()                    # make sure not to add combinations twice
while heap and len(result) < NUM_BEST:
    cost, i, j = heappop(heap)     # get next-best combination
    if (i, j) in visited: continue # did we see those before? skip
    visited.add((i, j))
    if is_compatible(i, j):        # if 'compatible', add to results
        result.append((cost, dep[i], arr[j]))
    # add 'adjacent' combinations to the heap
    if i < len(dep) - 1:           # next-best departure + same arrival
        heappush(heap, (dep[i+1] + arr[j], i+1, j))
    if j < len(arr) - 1:           # same departure + next-best arrival
        heappush(heap, (dep[i] + arr[j+1], i, j+1))
print result

# just for testing: compare to brute-force (get best from all combinations)
comb = [(d, a) for d in dep for a in arr]
best = sorted((d+a, d, a) for (d, a) in comb)[:NUM_BEST]
print best
print result == best # True -> same results as brute force (just faster)

本作品大致是这样的:

  • 排序两个出发航班 DEP 和到达航班改编通过它们的成本
  • 创建一个堆,放在他们的名单的最佳组合(最好的出发和最佳到达),以及相应的指标到堆:(DEP [0] + ARR [0],0, 0)
  • 重复,直到你有足够的组合或有堆中没有更多的元素:
    • 从堆中弹出极品元(按总费用排序)
    • 如果它满足约束上,将其添加到结果集
    • 确保你不会两次添加到结果集的航班,使用访问设置
    • 添加两个相邻的组合堆,即采取从同一航班出发键,下自改编,接下来从 DEP 和同样来自改编,即(DEP [我+ 1] +改编[J],I + 1,J)(DEP [I] +改编[J + 1],I,J + 1)
    • sort both the departure flights dep and the arrival flights arr by their cost
    • create a heap and put the best combination (best departure and best arrival) as well as the corresponding indices in their lists into the heap: (dep[0] + arr[0], 0, 0)
    • repeat until you have enough combinations or there are no more elements in the heap:
      • pop the best element from the heap (sorted by total cost)
      • if it satisfies the contraints, add it to the result set
      • make sure you do not add flights twice to the result set, using visited set
      • add the two 'adjacent' combinations to the heap, i.e. taking the same flight from dep and the next from arr, and the next from dep and the same from arr, i.e. (dep[i+1] + arr[j], i+1, j) and (dep[i] + arr[j+1], i, j+1)

      下面是一个非常小的工作实例。该轴(成本)在 DEP 改编航班,并在表中的条目形式 N()M ,其中 N 是该条目添加到堆迭代(如果有的话), C 的成本,而 M 是迭代它被添加到前10名结果列表(如果有的话)。

      Here's a very small worked example. The axes are (the costs of) the dep and arr flights, and the entries in the table are in the form n(c)m, where n is the iteration that entry was added to the heap (if it is at all), c is the cost, and m is the iteration it was added to the 'top 10' result list (if any).

      dep\arr     1       3       4       6       7
         2      0(3)1   1(5)4   4(6)8   8(8)-     -
         2      1(3)2   2(5)6   6(6)9   9(8)-     -
         3      2(4)3   3(6)7   7(7)-     -       -
         4      3(5)5   5(7)-     -       -       -
         6      5(7)10    -       -       -       -
      
      Result: (1,2), (1,2), (1,3), (3,2), (1,4), (3,2), (3,3), (2,4), (2,4), (1,6)
      

      请注意如何款项中均列和矩阵的行总是不断增加,因此,最好的结果可始终以稍微三角形区域中的左上找到。现在的想法是,如果你的当前最佳组合(一说是先在堆)是 DEP [I],编曲[I] ,然后有在检查没有用,例如,组合 DEP [I + 2],编曲[I] 前检查 DEP [I + 1],编曲[I] ,其中必须有一个更低的总成本,所以添加 DEP [I + 1],编曲[I] (同​​样地 DEP [我],编曲[I + 1] )堆,并重复从堆中弹出下一个元素。

      Note how the sums in both the columns and the rows of the matrix are always increasing, so the best results can always be found in a somewhat triangular area in the top-left. Now the idea is that if your currently best combination (the one that's first in the heap) is dep[i], arr[i], then there's no use in checking, e.g., combination dep[i+2], arr[i] before checking dep[i+1], arr[i], which must have a lower total cost, so you add dep[i+1], arr[i] (and likewise dep[i], arr[i+1]) to the heap, and repeat with popping the next element from the heap.

      余相比该算法到您的蛮力方法的结果的结果,并将所得的航班是相同的,即算法的工作,和的总是得到最佳的结果的。复杂性应该是的 O(N日志(N))的排序的出发和到达列表( N 的是航班在这些原始列表中的号码),再加上的 0 (M数(M))的用于堆循环( M 的重复使用的<​​em>日志(M)的每次迭代的工作, M 的是在结果列表中的元素)的数

      I compared the results of this algorithm to the results of your brute-force approach, and the resulting flights are the same, i.e. the algorithm works, and always yields the optimal result. Complexity should be O(n log(n)) for sorting the departure and arrival lists (n being the number of flights in those original lists), plus O(m log(m)) for the heap-loop (m iterations with log(m) work per iteration, m being the number of elements in the result list).

      此找到最佳的 1000 的组合的 100,000 的出发和 100,000 的到达航班(总共的 1,000,000,000,000 的在可能的组合)的小于一秒

      This finds the best 1,000 combinations of 100,000 departure and 100,000 arrival flights (for a total of 1,000,000,000,000 possible combinations) in less than one second.

      请注意,这些数字是对你有没有任何附加限制的情况下,即每个离港航班可以到达每一个飞行组合。如果有限制,你可以用勾画在上面code中的 is_compatible 函数来检查这些并跳过配对。这意味着,对于每个不相容对具有低的总成本,回路需要一个附加的迭代。这意味着,在最坏的情况下的,例如,如果有的没有兼容的对在所有的的,或当只适用对那些具有最高总成本,该算法可以其实扩大的所有的组合。

      Note that those numbers are for the case that you have no additional constraints, i.e. each departure flight can be combined with each arrival flight. If there are constraints, you can use the is_compatible function sketched in the above code to check those and to skip that pairing. This means, that for each incompatible pair with low total cost, the loop needs one additional iteration. This means that in the worst case, for example if there are no compatible pairs at all, or when the only compatible pairs are those with the highest total cost, the algorithm could in fact expand all the combination.

      在平均,不过,这不应该是这样的,该算法应而迅速地进行。

      On average, though, this should not be the case, and the algorithm should perform rather quickly.

      这篇关于算法从两个列表中选择一个最佳组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆