如何有效地从一堆袜子中配对? [英] How can I pair socks from a pile efficiently?

查看:22
本文介绍了如何有效地从一堆袜子中配对?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨天我在把干净的洗衣店里的袜子配对,发现我这样做的方式效率不高.我正在做一个天真的搜索 - 挑选一只袜子并迭代"一堆以找到它的一双.这需要平均迭代 n/2 * n/4 = n2/8 个袜子.

Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and "iterating" the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.

作为一名计算机科学家,我在想我能做什么?排序(根据大小/颜色/...)当然会想到实现 O(NlogN) 解决方案.

As a computer scientist I was thinking what I could do? Sorting (according to size/color/...) of course came to mind to achieve an O(NlogN) solution.

散列或其他非就地解决方案不是一种选择,因为我无法复制我的袜子(尽管如果可以的话会很好).

Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).

所以,问题基本上是:

给定一堆 n 双袜子,包含 2n 个元素(假设每只袜子正好有一对匹配),有效配对它们的最佳方法是什么最多对数的额外空间?(如果需要,我相信我能记住那么多信息.)

Given a pile of n pairs of socks, containing 2n elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)

我将感谢解决以下方面的答案:

I will appreciate an answer that addresses the following aspects:

  • 适用于大量袜子的一般理论解决方案.
  • 袜子的实际数量并不多,我不相信我和我的配偶有 30 多双.(而且很容易区分我的袜子和她的袜子;这也可以使用吗?)
  • 它是否等同于元素差异性问题?
  • A general theoretical solution for a huge number of socks.
  • The actual number of socks is not that large, I don't believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
  • Is it equivalent to the element distinctness problem?

推荐答案

排序方案已经提出,但是排序有点过分:我们不需要排序;我们只需要平等团体.

Sorting solutions have been proposed, but sorting is a little too much: We don't need order; we just need equality groups.

所以散列就足够了(而且速度更快).

So hashing would be enough (and faster).

  1. 对于每种颜色的袜子,形成一堆.遍历输入篮中的所有袜子并将它们分配到颜色堆.
  2. 迭代每一堆并通过其他一些指标(例如模式)将其分配到第二组堆中
  3. 递归应用此方案,直到您将所有袜子分配到可以立即进行视觉处理的非常小的堆
  1. For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.
  2. Iterate over each pile and distribute it by some other metric (e.g. pattern) into the second set of piles
  3. Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

这种递归散列分区实际上是由 SQL Server 在需要散列时完成的在庞大的数据集上加入或散列聚合.它将其构建输入流分发到许多独立的分区中.该方案线性扩展到任意数量的数据和多个 CPU.

This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

如果您能找到一个分布键(散列键),它提供足够的存储桶,每个存储桶都足够小,可以非常快速地处理,那么您就不需要递归分区.不幸的是,我不认为袜子有这样的属性.

You don't need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don't think socks have such a property.

如果每只袜子都有一个名为PairID"的整数,则可以根据PairID % 10(最后一位数字)轻松地将它们分配到 10 个桶中.

If each sock had an integer called "PairID" one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).

我能想到的最好的现实世界分区是创建一个矩形堆:一个维度是颜色,另一个维度是图案.为什么是长方形?因为我们需要 O(1) 随机访问堆.(一个 3D cuboid 也可以使用,但这不是很实用.)

The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is the pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)

更新:

并行性怎么样?多人可以更快地匹配袜子吗?

What about parallelism? Can multiple humans match the socks faster?

  1. 最简单的并行化策略是让多个工作人员从输入篮中取出袜子并将袜子放在堆上.这只会放大这么多 - 想象一下 100 人在 10 堆上战斗.同步成本(表现为手部碰撞和人类交流)破坏效率和加速(请参阅通用可扩展性法则!).这是否容易死锁?不,因为每个工人一次只需要访问一堆.只有一个锁"就不会出现死锁.活锁可能取决于人类如何协调对桩的访问.他们可能只是使用随机退避,就像网卡在物理层面上那样做,以确定哪些卡可以独占访问网线.如果它适用于 NIC,它也应该适用于人类.
  2. 如果每个工人都有自己的一组桩,它几乎可以无限扩展.然后工作人员可以从输入篮中取出大块袜子(很少有争用,因为他们很少这样做)并且他们在分发袜子时根本不需要同步(因为他们有线程本地堆).最后,所有工人都需要联合他们的桩组.我相信如果工作人员形成聚合树,则可以在 O(日志(工作人员数量 * 每个工作人员的桩数))内完成.
  1. The simplest parallelization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much - imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!). Is this prone to deadlocks? No, because each worker only needs to access one pile at a time. With just one "lock" there cannot be a deadlock. Livelocks might be possible depending on how the humans coordinate access to piles. They might just use random backoff like network cards do that on a physical level to determine what card can exclusively access the network wire. If it works for NICs, it should work for humans as well.
  2. It scales nearly indefinitely if each worker has its own set of piles. Workers can then take big chunks of socks from the input basket (very little contention as they are doing it rarely) and they do not need to synchronise when distributing the socks at all (because they have thread-local piles). At the end, all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

元素差异性问题怎么样?正如文章所述,元素差异性问题可以在 O(N) 中解决.这对于袜子问题也是一样的(也是 O(N),如果你只需要一个分布步骤(我提出多个步骤只是因为人类不擅长计算 - 如果你分布在一个步骤就足够了)md5(color, length, pattern, ...),即所有属性的完美散列)).

What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations - one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).

显然,不能比O(N)更快,所以我们已经达到了最优下限.

Clearly, one cannot go faster than O(N), so we have reached the optimal lower bound.

虽然输出并不完全相同(在一种情况下,只是一个布尔值.在另一种情况下,袜子对),渐近复杂性是相同的.

Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.

这篇关于如何有效地从一堆袜子中配对?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆