匹配不可打印/非ascii字符并从文本中删除 [英] Match non printable/non ascii characters and remove from text

查看:150
本文介绍了匹配不可打印/非ascii字符并从文本中删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的JavaScript非常生疏,所以对此的任何帮助都会很棒。我需要检测不可打印的字符(控制字符,如SOH,BS等)以及扩展的ascii字符,如字符串中的Ž并删除它们但我不知道如何编写代码?

My JavaScript is quite rusty so any help with this would be great. I have a requirement to detect non printable characters (control characters like SOH, BS etc) as well extended ascii characters such as Ž in a string and remove them but I am not sure how to write the code?

有人能指出我正确的方向来解决这个问题吗?这就是我到目前为止:

Can anyone point me in the right direction for how to go about this? This is what I have so far:

$(document).ready(function() {
    $('.jsTextArea').blur(function() {
        var pattern = /[^\000-\031]+/gi;
        var val = $(this).val();
        if (pattern.test(val)) {    
        for (var i = 0; i < val.length; i++) {
            var res = val.charAt([i]);
                alert("Character " + [i] + " " + res);              
        }          
    }
    else {
         alert("It failed");
     }

    });
});


推荐答案

对于那些有这个问题并正在寻找的人'修复所有'解决方案......这就是我最终修复它的方法:

For those who have this problem and are looking for a 'fix all' solution... This is how I eventually fixed it:

public static string RemoveTroublesomeCharacters(string inString)
{
    if (inString == null)
    {
        return null;
    }

    else
    {
        char ch;
        Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
        Match charMatch = regex.Match(inString);

        for (int i = 0; i < inString.Length; i++)
        {
            ch = inString[i];
            if (char.IsControl(ch))
            {
                string matchedChar = ch.ToString();
                inString = inString.Replace(matchedChar, string.Empty);
            }
        }

        while (charMatch.Success)
        {
            string matchedChar = charMatch.ToString();
            inString = inString.Replace(matchedChar, string.Empty);
            charMatch = charMatch.NextMatch();
        }
    }       

    return inString;
}

我会为经验较少的人细分一些细节: / p>

I'll break it down a bit more detail for those less experienced:


  1. 我们首先循环遍历整个字符串的每个字符,并使用char的IsControl方法来确定字符是否为控制字符或者不是。

  1. We first loop through every character of the entire string and use the IsControl method of char to determine if a character is a control character or not.

如果找到控制字符,请将匹配的字符复制到字符串,然后使用Replace方法将控制字符更改为空字符串。冲洗并重复其余的字符串。

If a control character is found, copy that matched character to a string then use the Replace method to change the control character to an empty string. Rinse and repeat for the rest of the string.

一旦我们遍历整个字符串,我们就会使用定义的正则表达式(它将匹配任何字符)不是控制字符或标准ascii字符)并再次用空字符串替换匹配的字符。在while循环中执行此操作意味着charMatch始终为true,将替换该字符。

Once we have looped through the entire string we then use the regex defined (which will match any character that is not a control character or standard ascii character) and again replace the matched character with an empty string. Doing this in a while loop means that all the time charMatch is true the character will be replaced.

最后删除所有字符后我们将整个字符循环字符串我们返回inString。

Finally once all characters are removed and we have looped the entire string we return the inString.

(注意:我还没有设法弄清楚如何重新填充TextBox使用新修改的inString值,所以如果有人能指出它是如何做的那将是伟大的)

(Note: I have still not yet managed to figure out how to repopulate the TextBox with the new modified inString value, so if anyone can point out how it can be done that would be great)

这篇关于匹配不可打印/非ascii字符并从文本中删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆