如何在阵列中的快速替换字符 [英] how to replace characters in a array quickly

查看:189
本文介绍了如何在阵列中的快速替换字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用一个XML文件中的XML文本阅读器,可能含有读者无效字符。我最初的想法是创建自己的流阅读器版本,并清理出坏的字符,但它严重拖慢我的计划。

I am using a XML Text reader on a XML file that may contain characters that are invalid for the reader. My initial thought was to create my own version of the stream reader and clean out the bad characters but it is severely slowing down my program.

public class ClensingStream : StreamReader
{
        private static char[] badChars = { '\x00', '\x09', '\x0A', '\x10' };
    //snip
        public override int Read(char[] buffer, int index, int count)
        {
            var tmp = base.Read(buffer, index, count);

            for (int i = 0; i < buffer.Length; ++i)
            {
                //check the element in the buffer to see if it is one of the bad characters.
                if(badChars.Contains(buffer[i]))
                    buffer[i] = ' ';
            }

            return tmp;
        }
}

根据我的探查了code花费了当时88%的

如果(badChars.Contains(缓冲[I]))什么是正确的方法要做到这一点,所以我不会引起可怕的缓慢?

according to my profiler the code is spending 88% of its time in if(badChars.Contains(buffer[i])) what is the correct way to do this so I am not causing horrible slowness?

推荐答案

这是该行花了这么多时间的原因是因为包含方法遍历数组查找的字符。

The reason that it spends so much time in that line is because the Contains method loops through the array to look for the character.

把字符在的HashSet&LT;焦炭&GT; 而不是:

private static HashSet<char> badChars =
  new HashSet<char>(new char[] { '\x00', '\x09', '\x0A', '\x10' });

要检查code。如果集包含的字符看起来一样寻找数组中的时候,但它使用的字符的哈希值code看,而不是通过在所有项目循环吧该数组。

The code to check if the set contains the character looks the same as when looking in the array, but it uses the hash code of the character to look for it instead of looping through all the items in the array.

另外,你可以把字符一个开关,这样编译器会创建一个有效的对比:

Alternatively, you could put the characters in a switch, that way the compiler would create an efficient comparison:

switch (buffer[i]]) {
  case '\x00':
  case '\x09':
  case '\x0A':
  case '\x10': buffer[i] = ' '; break;
}

如果你有更多的字符(五六IIRC),编译器将实际创建一个哈希表来查找案件,所以这将是类似于使用 HashSet的

If you have more characters (five or six IIRC), the compiler will actually create a hash table to look up the cases, so that would be similar to using a HashSet.

这篇关于如何在阵列中的快速替换字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆