从一堆有效地对袜子? [英] Pair socks from a pile efficiently?

查看:252
本文介绍了从一堆有效地对袜子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨天我从干净的衣物配对的袜子和想通了,我是这样做的方式是不是很有效。我在做一个天真的搜索  - 选择其中的一个袜子和迭代桩以便找到它的对。这需要通过迭代N / 2 * N / 4 = N 2 / 8袜子的平均水平。

Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.

作为一名计算机科学家,我想我能做什么?当然,排序(按大小/颜色/ ...)来到介意以达到O(NlogN)的解决方案。

As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came to mind to achieve an O(NlogN) solution.

散列或其他不就地解决方案是不是一种选择,因为我不能够复制我的袜子(虽然它可能如果我能很好)。

Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).

所以,问题基本上是:

鉴于一堆 N 双袜子,包含 2N 元素(假设每个袜子都有一个确切的匹配对),什么是有效率地配对起来长达数额外空间的最佳方式是什么? (我相信我能记得的,如果需要的信息的数量。)

Given a pile of n pairs of socks, containing 2n elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)

我会AP preciate,解决以下几个方面来回答:

I will appreciate an answer that addresses the following aspects:

  • 一般的理论的一个巨大的袜子数量的解决方案。
  • 在袜子的实际数量不是很大,我不相信我的配偶和我有30多双。 (这是相当容易的我的袜子和她之间的区别;这可以被用作阱)
  • 等同于元素清晰度问题呢?
  • A general theoretical solution for a huge number of socks.
  • The actual number of socks is not that large, I don't believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
  • Is it equivalent to the element distinctness problem?

推荐答案

分拣解决方案已被提出,但排序是有点太多了:我们不需要秩序; 我们只需要平等组

Sorting solutions have been proposed, but sorting is a little too much: We don't need order; we just need equality groups.

所以散列就足够了(快)。

  1. 对于袜子每种颜色,成堆。遍历所有的袜子在你的输入篮下,并分发到颜色桩
  2. 在迭代每根桩和其他一些指标(如图形)分发到第二组桩
  3. 递归应用此方案,直到您已经分布在所有的袜子到非常小的桩,你可以直观地处理立即
  1. For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.
  2. Iterate over each pile and distribute it by some other metric (e.g. pattern) into a second set of piles
  3. Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

这种递归散列分区实际上是由做 SQL Server时,它需要哈希联接或哈希合计超过庞大的数据集。它分配其生成的输入流为多个分区,它是独立的。这项计划扩展到任意数量的数据和多个CPU线性的。

This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

您不必递归分割,如果你能找到一个分配密钥(哈希键)的提供了足够的水桶,每个桶是足够小,可以非常快速地处理。不幸的是,我不认为袜子有这样的属性。

You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.

如果每个袜子有所谓的PairID整人可以轻松地将它们按 PairID%10 分配到10桶​​(最后一位)。

If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).

最好的现实世界的划分我能想到的是创建成堆的矩形:一个维度是彩色的,另一种是模式。为什么一个矩形?因为我们需要O(1)随机访问桩。 (三维长方体也将工作,但不是很实用。)

The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)

更新:

什么并行?可以在多个人体相匹配的袜子快?

What about parallelism? Can multiple humans match the socks faster?

  1. 在最简单的parallization策略是有多个工人需要从输入篮筐,把袜子到桩。这只会大大加快了这么多 - 想象100人争夺10桩。 同步成本(表现自己是手工碰撞和人际交往)的破坏效率和加速(见的通用可扩展性法律!)。这是容易的死锁?没有,因为每个工人仅需要访问一个桩的时间。只需一个锁定不能有死锁。 活锁这取决于人类如何协调访问桩可能的。他们可能只是使用随机退避像网卡做,在物质层面,以确定哪些卡可以完全访问网络丝。如果它适合网卡,它应该为人类的正常工作。
  2. 在这可以近似无限,如果每个工人都有自己的一套桩的。那么工人可以采取的袜子大块从输入篮(非常小的争论,因为他们正在做的很少),并在所有分配的袜子时,他们并不需要同步(因为他们有线程局部桩)。最后,所有的工人需要工会的堆积集。我认为,可以在O完成(日志(工人数*人均桩))如果工人形成的融合树
  1. The simplest parallization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much - imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!). Is this prone to deadlocks? No, because each worker only needs to access one pile at a time. With just one "lock" there cannot be a deadlock. Livelocks might be possible depending on how the humans coordinate access to piles. They might just use random backoff like network cards do that on a physical level to determine what card can exclusively access the network wire. If it works for NICs, it should work for humans as well.
  2. It scales nearly indefinitely if each worker has its own set of piles. Workers can then take big chunks of socks from the input basket (very little contention as they are doing it rarely) and they do not need to synchronise when distributing the socks at all (because they have thread-local piles). At the end, all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

怎么样元素清晰度问题?正如文章指出,元素明显问题,可以在解决O(N)。这是相同的袜子问题(也 O(N),如果你只需要一个分布步骤(我提出的多个步骤因为人类不擅长的计算 - 一步是不够的,如果你发布的 MD5(颜色,长度,样式,...),即完美散列所有属性))。

What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).

显然,我们不能去比 O(N)更快,所以我们已经达到了最佳下限

Clearly, one cannot go faster than O(N), so we have reached the optimal lower bound.

虽然输出并不完全相同(在一种情况下,只是一个布尔值。在其他情况下,双袜子),渐近复杂性是一样的。

Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.

这篇关于从一堆有效地对袜子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆