找到成对元素的索引 [英] Finding index of pairwise elements

查看:126
本文介绍了找到成对元素的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于目标('b','a')和输入:

x0 = ('b', 'a', 'z', 'z')
x1 = ('b', 'a', 'z', 'z')
x2 = ('z', 'z', 'a', 'a')
x3 = ('z', 'b', 'a', 'a')

目的是找到连续的位置('b','a')元素并获得输出:

The aim is to find the location of the continuous ('b', 'a') element and get the output:

>>> find_ba(x0)
0
>>> find_ba(x1)
0
>>> find_ba(x2)
None
>>> find_ba(x3)
1

使用成对食谱:

from itertools import tee
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

我可以这样做以获得所需的输出:

I could do this to get the desired output:

def find_ba(x, target=('b', 'a')):
    try:
        return next(i for i, pair in enumerate(pairwise(x)) if pair == target)
    except StopIteration:
        return None

但这需要我循环遍历所有字符对,直到找到第一个实例。 有没有办法找到成对元素的索引而不循环所有字符?

But that would require me to loop through all pairs of characters till I find the first instance. Is there a way to find index of pairwise elements without looping all the characters?

回答@ MatthiasFripp在评论中的问题:

Answering @MatthiasFripp's question in the comments:


您的元素是列表或类型(如图所示)还是生成器(例如从文件中读取)处理)?

Are your elements in lists or types (as shown) or in a generator (e.g. reading from a file handle)?

x *是字符串的所有元组。所以他们可以通过索引访问。但如果答案/解决方案适用于元组和生成器,那就太棒了!

The x* are all tuples of strings. So they can be access through the index. But if the answer/solution can work for tuples and generator, that'll be great!


你能说出你有多少列表吗?搜索以及他们有多长时间?这有助于建议搜索策略。

Can you say about how many lists you have to search and about how long they are? That would help for suggesting a search strategy.

元组的长度不固定。它们的大小可以> 2.

The lengths of the tuples are not fixed. They can be of size > 2.

推荐答案

最快的一般搜索算法将具有 O(n)平均表现(称为线性搜索),这意味着除了处理每个元素之外你没有其他选择(除了可能是一个常数因子)。

The fastest general search algorithm will have O(n) average performance (called linear search), that means you have no alternative (except maybe for a constant factor) than to process each element.

给出你的问题:


有没有办法找到成对元素的索引而不循环所有字符?

Is there a way to finding index of pairwise elements without looping all the characters?

这是可能的(虽然它仍然是 O(n))每秒项目:

That's possible (it's still O(n) though) by only looking at each second item:

from itertools import count

def find_ab(tup):
    for idx in count(start=1, step=2):
        try:
            if tup[idx] == 'b':
                if tup[idx+1] == 'a':
                    return idx
            elif tup[idx] == 'a':
                if tup[idx-1] == 'b':
                    return idx-1
        except IndexError:
            break

在最坏的情况下,它仍会比较所有项目,但它将跳过每个奇数索引项目的一项,该项目不是'b''a'

In the worst case it will still compare all items but it will skip one item for every odd-indexed item that isn't 'b' or 'a'.

这有点像作弊所以让我解释一下为什么常见的替代品在你的情况下是不可能的:

That's a bit like cheating so let me explain why common alternatives aren't possible in your case:

二进制搜索只需要比较 log(n)项目但它需要对序列进行排序。您的示例未排序,因此对它们进行排序将需要 O(n * log(n))操作 - 这不会只处理每个项目一次,它会处理一些他们好几次。并不是说我知道一种明智的方法来对相邻元素进行排序。

Binary search only needs to compare log(n) items but it requires the sequence to be sorted. Your examples aren't sorted so sorting them would require O(n*log(n)) operations - which wouldn't just process each item once, it would process some of them several times. Not that I know a sensible way to sort adjacent elements anyway.

你有元组所以创建一个哈希表(一个 dict )是没有意义的,因为为了创建该结构,你需要处理每个元素。

You have tuples so creating a hashtable (a dict) doesn't make sense because in order to create that structure you would need to process each element.

但是如果你计划对这些对进行多次搜索,你可以创建一次字典( O(n))并在之后做很多搜索在 O(1)

But if you plan to do several of these searches for pairs you could create the dictionary once (O(n)) and do many searches afterwards in O(1):

d = {}
for idx, pair in enumerate(pairwise(x0)):
    if pair not in d:    # keep only the first index for each pair
        d[pair] = idx

>>> d.get(('b', 'a'), None)
0

但是,如果您只想搜索一个对,那么这种方法要慢得多,因为您丢失了短路行为(一旦找到匹配就会停止)并且您在创建时处理所有元素字典。

However that approach is far slower if you only want to search for one pair, because you lose the "short-circuit behaviour" (stops as soon as a match is found) and you process all elements when creating the dictionary.

除了一般方法:


  • O(n)线性搜索

  • O( log(n))二进制搜索(用于排序数据)

  • O(1) lookups(for可以在一些桶中搜索的可清除查找或其他搜索问题

  • O(n) linear search
  • O(log(n)) binary search (for sorted data)
  • O(1) lookups (for hashable lookups or other search problems that just need to search in some "bucket"s)

您通常可以利用任何结构或知识您的数据可减少您需要处理的项目数量。问题主要在于(可能)没有已经存在的数据结构,而且自制实现通常比天真的处理所有元素方法慢几个数量级。但是,如果您有关于序列的任何元信息,那么您可以利用它。

You can generally exploit any structure or knowledge about your data to reduce the amount of items you need to process. The problem is mostly that there are (probably) no already existing data structures for these and homemade implementations often end up orders of magnitude slower than naive "process all elements" approaches. But if you have any meta-informations about your sequences then you can exploit it.

成对的配方实际上相当不错,但你也可以使用 iteration_utilities.successive 1 。最后我检查它比配方快大约1.5至2倍。即使您不改变方法并接受在最坏的情况下您需要处理所有(或几乎所有)元素,可能更快!

The recipe for pairwise is actually quite nice, but you could also use iteration_utilities.successive1. Last I checked it was roughly 1.5 to 2 times faster than the recipe. Even if you don't change the approach and accept that you need to process all (or almost all) elements in the worst case it may be faster!

可能已生成该数据。也许在创作过程中实际搜索元素是值得的。这样你根本不需要对数据进行额外的传递。或者您可以在创建数据集时创建 dict (之后允许 O(1)查找)。如果有某种方式可以提取信息,有时候查看生成/下载/获取数据集的过程是个好主意。

That data was probably generated. Maybe it's worthwhile to actually "search" for the elements during creation. That way you don't need to do the extra pass over the data at all. Or you could create the dict while creating the dataset (that allows for the O(1) lookups afterwards). Sometimes it's a good idea to look at the process that generated/downloaded/fetched the dataset if there is some way the information can be extracted.

现在,写完所有这些之后文字我需要说清楚:

Now, after writing all this text I need to state the obvious:

你的方法非常好。即使它需要在最坏的情况下处理所有元素,它对于手头的问题使用完美的拟合(成对 -recipe)并且它实际上应该非常快速地工作投入。对于包含100万'z'的元组,它在我的计算机上只需要200ms。因此,您每秒可以处理数百万个元素(即使是像我这样的老式和慢速计算机)。对于大数据而言,这可能不够快,但是纯python并不是处理大数据的好语言(通常你需要编写C扩展,使用Cython或一些NumPy,Pandas或衍生方法)。此外,生成器上的 next 函数是惰性的(假设您在python2上使用 itertools.izip 而不是 zip )因此您只需处理每个元组,直到找到匹配项。

Your approach is really good. Even if it needs to process all elements in the worst case it uses a perfect fit (pairwise-recipe) for the problem at hand and it should actually work really fast even for long inputs. For a tuple containing 1 million 'z' it only needs 200ms on my computer. So you can process several million elements per second (even on old and slow computers like mine). That's probably not fast enough for big-data but then pure-python isn't a good language for processing big-data (typically you would need to write a C extension, use Cython or some NumPy, Pandas or derivative approach). Also the next function on a generator is lazy (assuming you use itertools.izip on python2 instead of zip) so you only process each tuple until you find a match.

就个人而言,我只会使用您原来的方法。或者,如果我必须找到几对,那么我只需创建我之前提到的字典(甚至可以序列化它)并在其中进行查找。

Personally, I would simply use your original approach. Or if I had to find several pairs then I would just create the dictionary I mentioned earlier (maybe even serialize it) and do the lookups therein.

赏金理由明确要求可信和/或官方来源。对Fortunatly搜索算法进行了深入研究,因此您可以在算法的基础教科书中找到每种提到的方法的解释。例如:

The bounty reason explicitly wants "credible and/or official sources". Fortunatly "search algorithms" are well studied so you can find explanations for each of the mentioned approaches in basic text books on algorithms. For example:


  • Cormen,et。 al - 算法简介

  • Sedgewick& Wayne - 算法

  • Cormen, et. al - Introduction to Algorithms
  • Sedgewick & Wayne - Algorithms

维基百科:线性搜索

还有一个python wiki中python类型的时间复杂性的小概述:TimeComplexity。对于查找,您必须选中获取项目或输入。

There's also a small overview of time complexities of python types in the python wiki: "TimeComplexity". For lookups you have to check "Get Item " or "in".

1 披露:我是第三方图书馆的作者。

1 Disclosure: I'm the author of that 3rd party library.

这篇关于找到成对元素的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆