最有效的方法去除串的特殊字符 [英] Most efficient way to remove special characters from string

查看:132
本文介绍了最有效的方法去除串的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从一个字符串中删除所有特殊字符。允许的字符是A-Z(大写或小写),数字(0-9),下划线(_),或点号()。

I want to remove all special characters from a string. Allowed characters are A-Z (uppercase or lowercase), numbers (0-9), underscore (_), or the dot sign (.).

我有以下,它的工作原理,但我怀疑(我知道!)这不是很有效的:

I have the following, it works but I suspect (I know!) it's not very efficient:

    public static string RemoveSpecialCharacters(string str)
    {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < str.Length; i++)
        {
            if ((str[i] >= '0' && str[i] <= '9')
                || (str[i] >= 'A' && str[i] <= 'z'
                    || (str[i] == '.' || str[i] == '_')))
                {
                    sb.Append(str[i]);
                }
        }

        return sb.ToString();
    }

什么是最有效的方式做到这一点?将一个普通的前pression是什么样子,以及它是如何与普通的字符串操作比较?

What is the most efficient way to do this? What would a regular expression look like, and how does it compare with normal string manipulation?

这将被清理的字符串会比较短,一般长度为10和30个字符之间。

The strings that will be cleaned will be rather short, usually between 10 and 30 characters in length.

推荐答案

为什么你认为你的方法不是有效?它实际上是最有效的方法,你可以做到这一点的。

Why do you think that your method is not efficient? It's actually one of the most efficient ways that you can do it.

您当然应该读出的字符到一个局部变量或使用枚举的人数减少的数组访问:

You should of course read the character into a local variable or use an enumerator to reduce the number of array accesses:

public static string RemoveSpecialCharacters(string str) {
   StringBuilder sb = new StringBuilder();
   foreach (char c in str) {
      if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '.' || c == '_') {
         sb.Append(c);
      }
   }
   return sb.ToString();
}

一件事,使一个方法,像这样的效率是它很好地扩展。的执行时间将是相对于该字符串的长度。有没有讨厌的惊喜,如果你会使用它在一个大的字符串。

One thing that makes a method like this efficient is that it scales well. The execution time will be relative to the length of the string. There is no nasty surprises if you would use it on a large string.

编辑:结果
我做了一个快速的性能测试,运行的每个功能具有一个24字符的字符串一百万次。这些结果如下:


I made a quick performance test, running each function a million times with a 24 character string. These are the results:

原始功能:54.5毫秒结果。
我建议的变化:47.1毫秒结果。
矿用设定的StringBuilder能力:43.3毫秒结果
普通的前pression:294.4毫秒

Original function: 54.5 ms.
My suggested change: 47.1 ms.
Mine with setting StringBuilder capacity: 43.3 ms.
Regular expression: 294.4 ms.

编辑2:
我加了A-Z和A-Z在上面的code之间的区别。 (Ⅰ重新运行性能测试,并没有noticable差别。)

Edit 2: I added the distinction between A-Z and a-z in the code above. (I reran the performance test, and there is no noticable difference.)

编辑3:结果
我测试查找+的char []的解决方案,并在其约13毫秒运行。

Edit 3:
I tested the lookup+char[] solution, and it runs in about 13 ms.

要付出的代价是,当然,在庞大的查找表的初始化,并保持它在存储器中。好吧,这不算多的数据,但它是多的这样一个微不足道的功能...

The price to pay is, of course, the initialization of the huge lookup table and keeping it in memory. Well, it's not that much data, but it's much for such a trivial function...

private static bool[] _lookup;

static Program() {
   _lookup = new bool[65536];
   for (char c = '0'; c <= '9'; c++) _lookup[c] = true;
   for (char c = 'A'; c <= 'Z'; c++) _lookup[c] = true;
   for (char c = 'a'; c <= 'z'; c++) _lookup[c] = true;
   _lookup['.'] = true;
   _lookup['_'] = true;
}

public static string RemoveSpecialCharacters(string str) {
   char[] buffer = new char[str.Length];
   int index = 0;
   foreach (char c in str) {
      if (_lookup[c]) {
         buffer[index] = c;
         index++;
      }
   }
   return new string(buffer, 0, index);
}

这篇关于最有效的方法去除串的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆