重复文本查找 [英] Duplicate text-finding

查看:113
本文介绍了重复文本查找的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的主要问题是试图找到一个合适的解决方案来自动转向这一点,例如:

  D + C + D + f + d + c + d + f + d + c + d + f + d + c + d + f + 

改成:

  [d + c + d + f +] 4 

ie找到彼此相邻的重复,然后从这些重复中做出更短的循环。
到目前为止,我没有找到合适的解决方案,我期待着回应。 P.S。为了避免混淆,上述样本不是唯一需要循环的东西,它在文件之间是不同的。哦,这是为C ++或C#程序,或者是很好,虽然我也打开任何其他建议以及。另外,主要的想法是所有的工作将由程序本身完成,没有用户输入除了文件本身。
下面是完整的文件,以供参考,我对拉伸页道歉:
#0 @ 16 V225 Y10 W250 T76



L16
$ ED $ EF $ A9
p20,20
> ecegb> d< bgbgecgec< g
> d +< b> d + f + a +> c + < a + f + a + f + d + b + f + d +< bf +
> c a cegbgegec a ec &
d + c + d + f + d + c + d + f + d + c + d + f + d + c + d + f +
r1 ^ 1



/
l8
r1r1r1r1
f +< a +> f + g + cg + r4
a + c + a + g + cg + r4f + a + f + g + cg + r4
a + c + a + g + cg + r4f + a + f + g + cg + r4
+ c + a + g + c4 + b4 + b + b + b + b + b + b + b + b + b + b + b + 2 ^ g + f + g + 4
f + ff + 4fd + f4
d + c + d + 4c + c > c4d +
< g + 2 ^ 4r4 ^
a +> c + d + 4g + 4a + 4
r1 ^ 2 ^ 4 ^ a + 2 ^ g + f + g + 4
f + ff + 4fd + f4
d + c + d + 4c + c
> c4d +
a +> C + D + 4G + 4A + 4
R1 ^ 2 ^ 4 ^
r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1



#4 @ 22 V250 Y10



p $ l
o3
rg + rg + rg + rg + rg + rg + rg + rg + rg + rg + rg + rg + rg + rg + RG + RG + RG + RG + RG + RG + RG + RG + RG + RG +
/
r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1



#2 @ 4 V155 y10



l8
$ ED $ F8 $ 8F
o4
r1r1r1
d + 4f4f + 4g + 4
a + 4r1 ^ 4 ^ 2
/
d + 4 ^ fr2
f + 4 ^ fr2d + 4 ^ fr2
f + 4 ^ fr2d + 4 ^ fr2
f + 4 ^ fr2d + 4 ^ fr2
f + 4 ^ fr2
>
d + 4 ^ fr2
f + 4 ^ fr2d + 4 ^ fr2
f + 4 ^ fr2
<
f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2f + 4 ^ g + r2
f + 4 ^ fr2
&
a + 4 ^ g + r2
f + 1a + 4 ^ g + r2
f + 1
f + 4 ^ fr2
d + 1 $ b b F + 4 ^ FR2
D + 2 ^ D + 4 ^
r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1



#3 @ 10 V210 Y10



r1 ^ 1
o3
c8r8d8r8
c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8
c8
@ 10d16d16 @ 21
c8
@ 10d16d16 @ 21
c8
@ 10d16d16 @ 21
/
c4 @ 10d8 @ 21c8< b8> c8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< b8> c8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< ; b8> c8 @ 10d8 @ 21c8
c4 @ 10d8 @ 21c8< b8>
@ 10d16d16d16d16d16r16
C4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8< B8> C8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8b8 c8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8 b8 c8 @ 10d8 @ 21c8c4 @ 10d8 @ 21c8 b8 c8 @ 10d8 @ 21c8
c4 @ 10d8 @ 21c8
@ 10b16b16> c16c16

#7 @ 16 v230 y10



l16
$ ED $ EF $ A9
cceeggbbggeeccee
< bb> d + d + f + f + a + a + f + f + d + d + b + d + d +
< G + G + BB> D + D + FFD + D +< BBG + G + BB
/
r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1



#5 @ 4 v155 y10



l8
$ ED $ F8 $ 8F
o4
r1r1r1r1
d + 4r1 ^ 2 ^ 4
/
< a + 4 ^> cr2
c + 4 ^ cr2< a + 4 ^> cr2
c + 4 ^ cr2
c + 4 ^ cr2 cr2
c + 4 ^ cr2
a + 4 ^> cr2
c + 4 ^ cr2
< a + 4 ^> cr2
c + 4 ^ c
r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1
r2
f + 4 ^ fr2
d + 1f + 4 ^ fr2
d + 1
c + 4 ^ cr2
< a + 1
> c + 4 ^ cr2
< A + 2 ^ A + 4 ^
r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1


解决方案

不知道这是你在找什么



我采用字符串testtesttesttest4notaduped + c + d + f + d + c + d + f + d + c + d + f + d + c + d + f + testtesttest,并将其转换为[test] 4 4notadupe [d + c + d + f +] 4 [test] 3



有人会想出一个更好的更有效的解决方案,因为它在处理您的完整文件有点慢。我期待其他答案。

  string stringValue =testtesttesttest4notaduped + c + d + f + d + c + d + f + d + c + d + f + d + c + d + f + testtesttest; 

for(int i = 0; i {
for(int k = 1;(k * 2)+ i < stringValue.Length; k ++)
{
int count = 1;

string compare1 = stringValue.Substring(i,k);
string compare2 = stringValue.Substring(i + k,k);

//如果算多少重复
,而(比较1 ==比较2)
{
计数++;
k + = compare1.Length;
if(i + k + compare1.Length> stringValue.Length)
break;

compare2 = stringValue.Substring(i + k,compare1.Length);
}

if(count> 1)
{
//新代码。添加一个空格到结尾以避免[test] 4
//使用无效的数字转动ie:[test] 44。
string addString =[+ compare1 +]+ count +;

//如果我们节省空间只添加代码
if(addString.Length< compare1.Length * count)
{
stringValue = stringValue.Remove i,count * compare1.Length);
stringValue = stringValue.Insert(i,addString);
i = i + addString.Length - 1;
}
break;
}
}
}


My main problem is trying to find a suitable solution to automatically turning this, for example:

d+c+d+f+d+c+d+f+d+c+d+f+d+c+d+f+

into this:

[d+c+d+f+]4

i.e. Finding duplicates next to each other, then making a shorter "loop" out of these duplicates. So far I have found no suitable solution to this, and I look forward to a response. P.S. To avoid confusion, the aforementioned sample is not the only thing that needs "looping", it differs from file to file. Oh, and this is intended for a C++ or C# program, either is fine, though I'm open to any other suggestions as well. Also, the main idea is that all the work would be done by the program itself, no user input except for the file itself. Here is the full file, for reference, my apologies for the stretched page: #0 @16 v225 y10 w250 t76

l16 $ED $EF $A9 p20,20 >ecegb>d<bgbgecgec<g >d+<b>d+f+a+>c+<a+f+a+f+d+<b>f+d+<bf+ >c<a>cegbgegec<a>ec<ae > d+c+d+f+d+c+d+f+d+c+d+f+d+c+d+f+ r1^1

/ l8 r1r1r1r1 f+<a+>f+g+cg+r4 a+c+a+g+cg+r4f+<a+>f+g+cg+r4 a+c+a+g+cg+r4f+<a+>f+g+cg+r4 a+c+a+g+cg+r4 f+<a+>f+g+cg+r4 a+c+a+g+r4g+16f16c+ a+2^g+f+g+4 f+ff+4fd+f4 d+c+d+4c+c<a+2^4 >c4d+ <g+2^4r4^ a+>c+d+4g+4a+4 r1^2^4^a+2^g+f+g+4 f+ff+4fd+f4 d+c+d+4c+c<a+2^4 >c4d+ <g+2^4r4^ a+>c+d+4g+4a+4 r1^2^4^ r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1

#4 @22 v250 y10

l8 o3 rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+rg+ / r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1

#2 @4 v155 y10

l8 $ED $F8 $8F o4 r1r1r1 d+4f4f+4g+4 a+4r1^4^2 / d+4^fr2 f+4^fr2d+4^fr2 f+4^fr2d+4^fr2 f+4^fr2d+4^fr2 f+4^fr2 > d+4^fr2 f+4^fr2d+4^fr2 f+4^fr2 < f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2f+4^g+r2 f+4^fr2 > a+4^g+r2 f+1a+4^g+r2 f+1 f+4^fr2 d+1 f+4^fr2 d+2^d+4^ r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1

#3 @10 v210 y10

r1^1 o3 c8r8d8r8 c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8c8r8 c8 @10d16d16@21 c8 @10d16d16@21 c8 @10d16d16@21 / c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8 c4@10d8@21c8<b8> @10d16d16d16d16d16r16 c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8c4@10d8@21c8<b8>c8@10d8@21c8 c4@10d8@21c8 @10b16b16>c16c16<b16b16a16a16

#7 @16 v230 y10

l16 $ED $EF $A9 cceeggbbggeeccee <bb>d+d+f+f+a+a+f+f+d+d+<bb>d+d+ <aa>cceeggeecc<aa>cc <g+g+bb>d+d+ffd+d+<bbg+g+bb / r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1

#5 @4 v155 y10

l8 $ED $F8 $8F o4 r1r1r1r1 d+4r1^2^4 / <a+4^>cr2 c+4^cr2<a+4^>cr2 c+4^cr2<a+4^>cr2 c+4^cr2<a+4^>cr2 c+4^cr2 a+4^>cr2 c+4^cr2 <a+4^>cr2 c+4^c r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1 r2 f+4^fr2 d+1f+4^fr2 d+1 c+4^cr2 <a+1 >c+4^cr2 <a+2^a+4^ r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1r1

解决方案

Not sure if this is what you are looking for.

I took the string "testtesttesttest4notaduped+c+d+f+d+c+d+f+d+c+d+f+d+c+d+f+testtesttest" and converted it to "[test]4 4notadupe[d+c+d+f+]4 [test]3 "

I'm sure someone will come up with a better more efficient solution as it's a bit slow when processing your full file. I look forward to other answers.

        string stringValue = "testtesttesttest4notaduped+c+d+f+d+c+d+f+d+c+d+f+d+c+d+f+testtesttest";

        for(int i = 0; i < stringValue.Length; i++)
        {
            for (int k = 1; (k*2) + i <= stringValue.Length; k++)
            {
                int count = 1;

                string compare1 = stringValue.Substring(i,k);
                string compare2 = stringValue.Substring(i + k, k);

                //Count if and how many duplicates
                while (compare1 == compare2) 
                {
                    count++;
                    k += compare1.Length;
                    if (i + k + compare1.Length > stringValue.Length)
                        break;

                    compare2 = stringValue.Substring(i + k, compare1.Length);
                } 

                if (count > 1)
                {
                    //New code.  Added a space to the end to avoid [test]4 
                    //turning using an invalid number ie: [test]44.
                    string addString = "[" + compare1 + "]" + count + " ";

                    //Only add code if we are saving space
                    if (addString.Length < compare1.Length * count)
                    {
                        stringValue = stringValue.Remove(i, count * compare1.Length);
                        stringValue = stringValue.Insert(i, addString);
                        i = i + addString.Length - 1;
                    }
                    break;
                }
            }
        }

这篇关于重复文本查找的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆