最快的方式来替换多个字符串在一个巨大的字符串 [英] Fastest way to replace multiple strings in a huge string

查看:190
本文介绍了最快的方式来替换多个字符串在一个巨大的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在找最快的方式,以取代大(〜1MB)字符串的倍数(〜500)子。不管我曾尝试似乎与string.replace是这样做的最快的方法。

I m looking for the fastest way to replace multiple (~500) substrings of a big (~1mb) string. Whatever I have tried it seems that String.Replace is the fastest way of doing it.

我只关心最快的方式。 。不是代码的可读性,可维护性等我不关心,如果我需要使用不安全的代码或预处理原字符串或者

I just care about the fastest possible way. Not code readability, maintainability etc. I dont care if I need to use unsafe code or pre-process the original string either.

编辑:评论后,我增加了一些更多的细节:

After the comments I have added some more details:

每个迭代替换将取代美国广播公司与其他一些字符串的字符串(每个不同的迭代替换)。要替换的字符串将始终是相同的 - 农行将永远是ABC。决不ABD。因此,如果有 400.000 数千更换迭代。相同的字符串 - ABC - 将每一次其他一些(不同的)的字符串替换。

Each replace iteration will replace ABC on the string with some other string (different per replace iteration). The string to replace will ALWAYS be the same - ABC will always be ABC. Never ABD. So if there are 400.000 thousands replace iterations. The same string - ABC - will be replaced with some other (different) string each time.

我可以在ABC是什么控制。我可以使它超级短或长超只要它不影响结果。显然,ABC不能的你好的原因你好会在大部分输入字符串的话存在

I can be in control on what ABC is. I can make it super-short or super-long as long as it doesn't affect the results. Clearly ABC can't be hello cause hello will exist as a word in most of the input strings.

输入示例: ABCDABCABCDABCABCDABCABCDABCD

例如,从字符串替换: BC

Example replace from string: BC

示例使用字符串替换: AA,BB,CC,DD,EE(5次迭代)

Example replace with strings: AA, BB, CC, DD, EE (5 iterations)

示例输出:

AAADAAAAAADAAAAAADAAAAAADAAAD
ABBDABBABBDABBABBDABBABBDABBD
ACCDACCACCDACCACCDACCACCDACCD
ADDDADDADDDADDADDDADDADDDADDD
AEEDAEEAEEDAEEAEEDAEEAEEDAEED

平均情况:输入字符串为100-200KB与40.000更换迭代。
最坏的情况:输入字符串是1-2MB与400.000代替迭代

Average case: Input string is 100-200kb with 40.000 replace iterations. Worst case: Input string is 1-2mb with 400.000 replace iterations.

我可以做任何事情。做并行,做到这一点不安全,等它不事关我怎么做。重要的是,它需要尽可能快,因为它得到。

I can do ANYTHING. Do it in parallel, do it unsafe, etc. It doesnt matter how I do it. What matters is that it needs to be as fast as it gets.

感谢

推荐答案

由于我在这个问题略感兴趣,我制作的几个解决方案。随着铁杆优化有可能更加下去

As I were mildly interested in this problem, I crafted few solutions. With hardcore optimizations it's possible to go down even more.

要获取最新源:的https://github.com/ChrisEelmaa/StackOverflow/blob/master/FastReplacer.cs

和输出


-------------------------------------------------------
| Implementation       | Average | Separate runs      |
|----------------------+---------+--------------------|
| Simple               |    3485 | 9002, 4497, 443, 0 |
| SimpleParallel       |    1298 | 3440, 1606, 146, 0 |
| ParallelSubstring    |     470 | 1259, 558, 64, 0   |
| Fredou unsafe        |     356 | 953, 431, 41, 0    |
| Unsafe+unmanaged_mem |      92 | 229, 114, 18, 8    |
-------------------------------------------------------

您不会在制定自己的替代方法可能击败.NET的家伙,它的最可能已经使用不安全的。我相信你能得到它下降两个因素,如果你完全用C写吧。

You won't probably beat the .NET guys in crafting your own replace method, it's most likely already using unsafe. I do believe you can get it down by factor of two if you write it completely in C.

我的实现可能是马车,但你可以得到的总体思路。

My implementations might be buggy, but you can get the general idea.

这篇关于最快的方式来替换多个字符串在一个巨大的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆