有快速的算法来删除字符串中重复的子字符串吗？ [英] Is there a fast algorithm to remove repeated substrings in a string?

查看：103 发布时间：2020/6/3 20:36:28 python string algorithm

本文介绍了有快速的算法来删除字符串中重复的子字符串吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一个类似的字符串

dxabcabcyyyydxycxcxz

，我想将其合并到

dxabcydxycxz

其他示例：
ddxddx-> dxdx，abbab-> abab。

规则是：

if (adjacent and same): merge

# Such as 'abc',they are same and , so I will delete one of them .
# Although 'dx' is same as 'dx',they are nonadjacent,so I do not delete any of them
# If one character has been deleted, we don't delete any sub-string include it

我是在python代码中完成的，但是在长字符串中执行时很慢。 / p>

I did it in my code in python,but it's slow when did in a long string.

# original string
mystr = "dxabcabcyyyydxycxcxz"
str_len = len(mystr)
vis = [1] *str_len #Use a list to mark which char is deleted

# enumerate the size of sub-str
for i in range(1,str_len):
    # enumerate the begin of the sub-str
    for j in range(0, str_len):
        offset = 2 #the size of sub-str + 1
        current_sub_str = mystr[j:j+i]
        s_begin = j+i*(offset-1)
        s_end = j+(i*offset)
        # delete all of the same char
        while((j+(i*offset) <= str_len) and current_sub_str == mystr[s_begin:s_end]
              and 0  not in vis[s_begin:s_end] and 0  not in vis[j:j+i]):
            vis[s_begin:s_end] = [0] * (s_end - s_begin) #if I deleted it ,mark it as 0
            offset += 1
            s_begin = j + i * (offset - 1)
            s_end = j + (i * offset)

res = []
for i in range(0,str_len):
    if(vis[i]!=0): res.append(mystr[i])

print "".join(res)

有没有更快的解决方法？

Is there any faster way to solve it?

2017年4月29日更新

update April 29, 2017

对不起，这似乎是一个XY问题。另一方面，也许不是。
有内容

Sorry, it seems to like a XY problem.On the other hand,it maybe not. there is the content

我当时正在为网络蜘蛛编写代码，并且有很多像这样的标记路径

I was coding for a web spider and got many 'tag-path's like those

ul/li/a
ul/li/div/div/div/a/span
ul/li/div/div/div/a/span 
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a

如您所见，有一些标记路径以相同的方式进行操作，因此我想将它们折叠起来，以查找是否有其他任何具有相同结构的标记路径。
折叠后，我得到了这样的标记路径。

As you see,there are some 'tag-path' did a same way,so I wanted to collapse them to find is there any other 'tag-path's have the same structure. After collapsing, I get the 'tag-path' like this.

ul/li/a
ul/li/div/div/div/a/span
ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a

这只是我的主意，我不知道这样是否适合。（尝试之后，我选择了另一种方式。

This is only my idea and I didn't know whether it is suitable to do in this way.(After trying, I chose another way to do it.

但是，还有一个有趣的问题，例如ACM问题。

However there is an interesting question like a ACM question.

因此，我简化了角色的标记路径并寻求帮助。
实际上，这个问题有很多我不介意的问题，感谢所有帮助我完成此工作的人。

So I simplify one 'tag-path' to a character and ask for help.Because I didn't do a fast way by myself. Actually, the question has many corner cases that I don't mind and thank all for helping me complete it.

谢谢。

有快速的算法来删除字符串中重复的子字符串吗？ [英] Is there a fast algorithm to remove repeated substrings in a string?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

有快速的算法来删除字符串中重复的子字符串吗？ [英] Is there a fast algorithm to remove repeated substrings in a string?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭