排序字符串以匹配第二个字符串的最快方法-仅允许相邻交换 [英] Fastest way to sort string to match second string - only adjacent swaps allowed

查看:234
本文介绍了排序字符串以匹配第二个字符串的最快方法-仅允许相邻交换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获得转换一个字符串以匹配第二个字符串所需的最少字母交换次数。



输入为:字符串,string_1,string_2的长度



一些示例:

 长度|字串1 |字串2 |输出
------- + ---------- + ---------- + -------
3 | ABC | BCA | 2
7 | AABCDDD | DDDBCAA | 16
7 | ZZZAAAA | ZAAZAAZ | 6

这是我的代码:

 自定义字母(数字,单词_1,单词_2):

结果= 0

而单词_1!=单词_2:
index_of_letter = word_1 .find(word_2 [0])
结果+ = index_of_letter
word_1 = word_1.replace(word_2 [0],'',1)
word_2 = word_2 [1:]

返回结果

它给出正确的结果,但计算应保持在20秒以内。



这里有两组输入数据(1 000 000个长字符串):



其中的数字节点反映了左侧节点(包括自身)的数量。它们存储在 numLeft 列表中。另一个列表 parent 会预先计算父母所在的索引。



实际代码如下所示:

 自定义字母(word_1,word_2):
size = len(word_1)#无需将size作为参数$ b传递$ b#为word_1创建一个二叉树,按顺序以列表
#的形式组织,其值等于
#范围内(包括以下)的不匹配字母的数量当前索引:
treesize =(1< size.bit_length())-1
numLeft = [(i> 1 ^((i + 1)> 1) )+ 1为range(0,树大小)中的i]
#跟踪这棵树中的父母(可能更简单,我欢迎评论)。
parent = [(i&〜((i ^(i + 1))+ 1))| | (((((i ^(i + 1))+ 1)>>>>>>>>>>>> 1)表示范围(0,树大小)中的i。]
#为每个不同的字符
创建一个链接列表next = [ -1] *大小
head = {}
表示我在range(len(word_1)-1,-1,-1):#向后
c = word_1 [i]
#如果c在头,则在此字符
的链表的开头添加索引:
next [i] = head [c]
head [c] = i
#主循环计算每个字母
结果所需的掉期次数= 0
枚举(word_2)中的i,c:
#从链接列表
j中提取该字母的下一个匹配项= head [c]
head [c] = next [j]
#通过二叉树查找获取前面的字符数
p = j
index_of_letter = 0
而p <树木大小:如果p> = j,则
:#正确还是正确?
numLeft [p]-= 1#如果p< = j,则记录左侧
的字母已被删除:#在左侧还是左侧?
index_of_letter + = numLeft [p]#添加左侧字母的数量
p = parent [p]#走到树上
结果+ = index_of_letter
返回结果

此命令在 O(nlogn)中运行,其中 logn

我在数千个随机输入上进行了测试,以上代码在所有情况下均产生与您的代码相同的结果。但是...在较大的输入上运行速度要快得多。


I want to get the minimum number of letter-swaps needed to convert one string to match a second string. Only adjacent swaps are allowed.

Inputs are: length of strings, string_1, string_2

Some examples:

Length | String 1 | String 2 | Output
-------+----------+----------+-------
   3   | ABC      | BCA      |   2 
   7   | AABCDDD  | DDDBCAA  |  16
   7   | ZZZAAAA  | ZAAZAAZ  |   6

Here's my code:

def letters(number, word_1, word_2):

    result = 0

    while word_1 != word_2:
        index_of_letter = word_1.find(word_2[0])
        result += index_of_letter
        word_1 = word_1.replace(word_2[0], '', 1)
        word_2 = word_2[1:]

    return result

It gives the correct results, but the calculation should stay under 20 seconds.

Here are two sets of input data (1 000 000 characters long strings): https://ufile.io/8hp46 and https://ufile.io/athxu.

On my setup the first one is executed in around 40 seconds and the second in 4 minutes.

How to calculate the result in less than 20 seconds?

解决方案

Your algorithm runs in O(n2) time:

  • The find() call will take O(n) time
  • The replace() call will create a complete new string which takes O(n) time
  • The outer loop executes O(n) times

As others have stated, this can be solved by counting inversions using merge sort, but in this answer I try to stay close to your algorithm, keeping the outer loop and result += index_of_letter, but changing the way index_of_letter is calculated.

The improvement can be done as follows:

  • preprocess the word_1 string and note the first position of each distinct letter in word_1 in a dict keyed by these letters. Link each letter with its next occurrence. I think it is most efficient to create one list for this, having the size of word_1, where at each index you store the index of the next occurrence of the same letter. This way you have a linked list for each distinct letter. This preprocessing can be done in O(n) time, and with it you can replace the find call with a O(1) lookup. Every time you do this, you remove the matched letter from the linked list, i.e. the index in the dict moves to the index of the next occurrence.
  • The previous change will give the absolute index, not taking into account the removals of letters that you have in your algorithm, so this will give wrong results. To solve that, you can build a binary tree (also preprocessing), where each node represents an index in word_1, and which gives the actual number of non-deleted letters preceding a given index (including itself as well if not deleted yet). The nodes in the binary tree never get deleted (that might be an idea for a variant solution), but the counts get adjusted to reflect a deletion of a character. At most O(logn) nodes need to get a decremented value upon such a deletion. But apart from that no string would be rebuilt like with replace. This binary tree could be represented as a list, corresponding to nodes in in-order sequence. The values in the list would be the numbers of non-deleted letters preceding that node (including itself).

The initial binary tree could be depicted as follows:

The numbers in the nodes reflect the number of nodes at their left side, including themselves. They are stored in the numLeft list. Another list parent precalculates at which indexes the parents are located.

The actual code could look like this:

def letters(word_1, word_2):
    size = len(word_1) # No need to pass size as argument
    # Create a binary tree for word_1, organised as a list
    #   in in-order sequence, and with the values equal to the number of
    #   non-matched letters in the range up to and including the current index:
    treesize = (1<<size.bit_length()) - 1
    numLeft = [(i >> 1 ^ ((i + 1) >> 1)) + 1 for i in range(0, treesize)]
    # Keep track of parents in this tree (could probably be simpler, I welcome comments).
    parent = [(i & ~((i^(i+1)) + 1)) | (((i ^ (i+1))+1) >> 1) for i in range(0, treesize)]
    # Create a linked list for each distinct character
    next = [-1] * size
    head = {}
    for i in range(len(word_1)-1, -1, -1): # go backwards
        c = word_1[i]
        # Add index at front of the linked list for this character
        if c in head:
            next[i] = head[c]
        head[c] = i
    # Main loop counting number of swaps needed for each letter
    result = 0
    for i, c in enumerate(word_2):
        # Extract next occurrence of this letter from linked list
        j = head[c]
        head[c] = next[j]
        # Get number of preceding characters with a binary tree lookup
        p = j
        index_of_letter = 0
        while p < treesize:
            if p >= j:  # On or at right?
                numLeft[p] -= 1  # Register that a letter has been removed at left side
            if p <= j:  # On or at left?
                index_of_letter += numLeft[p] # Add the number of left-side letters
            p = parent[p] # Walk up the tree
        result += index_of_letter
    return result

This runs in O(nlogn) where the logn factor is provided by the upwards walk in the binary tree.

I tested on thousands of random inputs, and the above code produces the same results as your code in all cases. But... it runs a lot faster on the larger inputs.

这篇关于排序字符串以匹配第二个字符串的最快方法-仅允许相邻交换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆