优化想法 - 最长的公共子字符串 [英] Optimisation ideas - Longest common substring

查看:84
本文介绍了优化想法 - 最长的公共子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个程序,它应该找到多个字符串的最长的公共子字符串。它的作用,但如果字符串非常长(即> 8000个字符长),它工作缓慢(1.5秒)。
有任何优化方法吗?

I have this program which is supposed to find the Longest Common Substring of a number of strings. Which it does, but if the strings are very long (i.e. >8000 characters long), it works slowly (1.5 seconds). Is there any way to optimise that?

程序是这样:

//#include "stdafx.h"
#include <iostream>
#include <string>
#include <vector>
#include <cassert>


using namespace std;

const unsigned short MAX_STRINGS = 10;
const unsigned int  MAX_SIZE=10000;
vector<string> strings;
unsigned int len;

string GetLongestCommonSubstring( string string1, string string2 );
inline void readNumberSubstrings();
inline const string getMaxSubstring();

void readNumberSubstrings()
{
    cin >> len;

    assert(len > 1 && len <=MAX_STRINGS);

    strings.resize(len);

    for(register unsigned int i=0; i<len;i++)
        strings[i]=string(MAX_SIZE,0);

    for(register unsigned int i=0; i<len; i++)
        cin>>strings[i];
}

 const string getMaxSubstring()
{
    string maxSubstring=strings[0];
    for(register unsigned int i=1; i < len; i++)
        maxSubstring=GetLongestCommonSubstring(maxSubstring, strings[i]);
    return maxSubstring;
}

string GetLongestCommonSubstring( string string1, string string2 ) 
{

    const int solution_size = string2.length()+ 1;

    int *x=new int[solution_size]();
    int *y= new int[solution_size]();

    int **previous = &x;
    int **current = &y;

    int max_length = 0;
    int result_index = 0;

    int j;
    int length;
    int M=string2.length() - 1;

    for(register int i = string1.length() - 1; i >= 0; i--)
    {
        for(register int j = M; j >= 0; j--) 
        {
            if(string1[i] != string2[j]) 
                (*current)[j] = 0;
            else 
            {
                length = 1 + (*previous)[j + 1];
                if (length > max_length)
                {
                    max_length = length;
                    result_index = i;
                }

                (*current)[j] = length;
            }
        }

        swap(previous, current);
    }
    string1[max_length+result_index]='\0';
    return &(string1[result_index]);
}

int main()
{
    readNumberSubstrings();
    cout << getMaxSubstring() << endl;
    return 0;
}

注意:不写代码,可以解决这个问题的后缀树(他们很大)。

Note: there is a reason why I didn't write code that would solve this problem with suffix trees (they're large).

推荐答案

通常,当涉及到优化时,不同的方法可能是你唯一的真正选择,而不是试图逐步改进当前实现。这是我的想法:

Often when it comes to optimization, a different approach might be your only true option rather than trying to incrementally improve the current implementation. Here's my idea:


  • 创建可能出现在最长公共子字符串中的有效字符

将每个字符串分隔为多个字符串,这些字符串包含只有有效字符

separate each string into multiple strings containing only valid characters

,才能创建每个可能的子字符串并将其添加到列表

for every such string, create every possible substring and add it to the list as well

过滤(与字符一样)所有字符串。

filter (as with the characters) all strings, that don't show up in all lists.

这种复杂性很大程度上取决于无效字符的数量。如果它是零,这种方法根本没有帮助。

The complexity of this obviously depends largely on the number of invalid characters. if it's zero, this approach doesn't help at all.

您的代码的一些评论:不要试图过于聪明。编译器会优化这么多,真的没有必要在你的代码中放入 register 。第二,你分配字符串,然后覆盖它们(在 readNumberSubstrings ),这是完全不必要的。第三,通过const引用如果可以。第四,不要使用原始指针,特别是如果你从不 delete [] 你的 new [] d对象。使用 std :: vector s,它的行为与异常(你可能会遇到,你使用字符串很多!)。

Some remarks on your code: Don't try to be overly clever. The compiler will optimize so much, there's really no need for you to put register in your code. Second, your allocating strings and then overwrite them (in readNumberSubstrings), that's totally unnecessary. Third, pass by const reference if you can. Fourth, don't use raw pointers, especially if you never delete [] your new []d objects. Use std::vectors instead, it behaves well with exceptions (which you might encounter, you're using strings a lot!).

这篇关于优化想法 - 最长的公共子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆