摆脱C#字符串中零宽度空间的最简单方法 [英] Simplest way to get rid of zero-width-space in c# string

查看:76
本文介绍了摆脱C#字符串中零宽度空间的最简单方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在c#VSTO项目中使用正则表达式解析电子邮件.有时,正则表达式似乎不起作用(尽管如果我将文本和正则表达式粘贴到regexbuddy中,则该正则表达式正确匹配该文本).如果我查看gmail中的电子邮件,就会看到

I am parsing emails using a regex in a c# VSTO project. Once in a while, the regex does not seem to work (although if I paste the text and regex in regexbuddy, the regex correctly matches the text). If I look at the email in gmail, I see

=E2=80=8B

在某些行的开头和结尾(我知道这是UTF8零宽度空间);这似乎是搞砸了正则表达式.这似乎只是顺序出现.

at the beginning and end of some lines (which I understand is the UTF8 zero width space); this appears to be what is messing up the regex. This seems to be only sequence showing up.

摆脱此确切序列的最简单方法是什么?我不能做显而易见的事情

What is the easiest way to get rid of this exact sequence? I cannot do the obvious

MailItem.Body.Replace("=E2=80=8B", "")

因为这些字符未显示在c#字符串中.

because those characters don't show up in the c# string.

我也尝试过

byte[] bytes = Encoding.Default.GetBytes(MailItem.TextBody);
string myString = Encoding.UTF8.GetString(bytes);

但是零宽度空格只是显示为?.我想我可以遍历bytes数组并删除组成零宽度空间的字节,但是我不知道字节长什么样(看起来并不像将E2 80 8B转换为十进制然后搜索那样简单)

But the zero-width spaces just show up as ?. I suppose I could go through the bytes array and remove the bytes comprising the zero width space, but I don't know what the bytes would look like (it does not seem as simple as converting E2 80 8B to decimal and searching for that).

推荐答案

由于C#中的字符串以Unicode(不是UTF-8)存储,因此以下方法可以解决问题:

As strings in C# are stored in Unicode (not UTF-8) the following might do the trick:

MailItem.Body.Replace("\u200B", "");

这篇关于摆脱C#字符串中零宽度空间的最简单方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆