有效字符 [英] Valid Characters

查看:167
本文介绍了有效字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力确保我的XML文档中的所有字符都是

字符在本文档中指定:
http://www.w3.org/TR/2000/REC-xml-20001006#charsets


像这样的函数会起作用吗:


私有静态字符串格式XMLString(字符串n)

{

if(string.IsNullOrEmpty(n))返回n;

System.Text.StringBuilder sb = new System.Text.StringBuilder();

char [] chrs = n.ToCharArray();

char c;

int x,j = chrs.Length;

for(x = 0; x< j; x ++)

{

c = chrs [x];

if(c == 0x9 || c == 0xA || c == 0xD ||

(c 0x20& c< 0xd7ff)||

(c 0xe000&& c< ; 0xffd)||

(c 0x10000&& c< 0x10ffff))

{

sb.Append(c);

}

}

返回sb.ToString( );

}


我从来没有把字符比作这样(0x9,0xffd等等)?

我不是想偷懒而不是自己测试,我只是不知道这个

类型的字符比较是否是结果的正确逻辑我是

寻找。


任何输入?

I''m trying to ensure that all the characters in my XML document are
characters specified in this document:
http://www.w3.org/TR/2000/REC-xml-20001006#charsets

Would a function like this work:

private static string formatXMLString(string n)
{
if (string.IsNullOrEmpty(n)) return n;
System.Text.StringBuilder sb = new System.Text.StringBuilder();
char[] chrs = n.ToCharArray();
char c;
int x, j = chrs.Length;
for (x = 0; x < j; x++)
{
c = chrs[x];
if (c == 0x9 || c == 0xA || c == 0xD ||
(c 0x20 && c < 0xd7ff) ||
(c 0xe000 && c < 0xffd) ||
(c 0x10000 && c < 0x10ffff))
{
sb.Append(c);
}
}
return sb.ToString();
}

I''ve never compared characters to like this (0x9, 0xffd, etc...)?
I''m not trying to be lazy and not test it myself, I just don''t know if this
type of character comparison is the correct logic for the results I''m
looking for.

Any input?

推荐答案




" preport" < pr ***** @ newsgroups.nospamwrote in message

news:#C ************** @ TK2MSFTNGP02.phx.gbl ...


"preport" <pr*****@newsgroups.nospamwrote in message
news:#C**************@TK2MSFTNGP02.phx.gbl...

我正在努力确保我的XML文档中的所有字符都是

本文档中指定的字符:
< a rel =nofollowhref =http://www.w3.org/TR/2000/REC-xml-20001006#charsets\"target =_ blank> http://www.w3.org/TR/2000 / REC-xml-20001006 #charsets


像这样的函数会工作吗:


私有静态字符串格式XMLString(字符串n )

{

if(string.IsNullOrEmpty(n))return n;

System.Text.StringBuilder sb = new System.Text。 StringBuilder();

char [] chrs = n.ToCharArray();

char c;

int x,j = chrs.Length ;

for(x = 0; x< j; x ++)

{

c = chrs [x];

if(c == 0x9 || c == 0xA || c == 0xD ||

(c 0x20&& c< 0xd7ff)||

(c 0xe000&& c< 0xffd)||

(c 0x10000&& c< 0x10ffff))

{

sb.Append(c);

}

}

返回sb。 ToString();

}


我从来没有把字符比作这样(0x9,0xffd等......)?

我不是想偷懒而不是自己测试,我只是不知道是否

这种类型的字符比较是结果的正确逻辑我'' m

正在寻找。


有什么输入?
I''m trying to ensure that all the characters in my XML document are
characters specified in this document:
http://www.w3.org/TR/2000/REC-xml-20001006#charsets

Would a function like this work:

private static string formatXMLString(string n)
{
if (string.IsNullOrEmpty(n)) return n;
System.Text.StringBuilder sb = new System.Text.StringBuilder();
char[] chrs = n.ToCharArray();
char c;
int x, j = chrs.Length;
for (x = 0; x < j; x++)
{
c = chrs[x];
if (c == 0x9 || c == 0xA || c == 0xD ||
(c 0x20 && c < 0xd7ff) ||
(c 0xe000 && c < 0xffd) ||
(c 0x10000 && c < 0x10ffff))
{
sb.Append(c);
}
}
return sb.ToString();
}

I''ve never compared characters to like this (0x9, 0xffd, etc...)?
I''m not trying to be lazy and not test it myself, I just don''t know if
this type of character comparison is the correct logic for the results I''m
looking for.

Any input?



当然。不要偷懒。


char是一个2字节的类型,因此你的文字都应该是2字节的文字,

并且应该为了比较而被转换为char。


例如

char space =(char)0x0020;


David

Sure. Don''t be lazy.

And a char is a 2-byte type, so your literals should all be 2-byte literals,
and should be cast to char for comparison.

eg
char space = (char)0x0020;

David


您好,


数据类型char在C#中用于16位Unicode字符,其范围

从U + 0000到U + ffff。因此,您的代码中可能不需要以下行:


(c 0x10000&& c< 0x10ffff))


它超出了C#char的范围,我们不会在

C#应用程序中获得这样的价值。


当将字符串或文件加载到XMLDocument元素中时,字符将有效,并且如果存在任何无效的字符,则会抛出异常。

您的函数将检查字符串在这之前。我认为这是一个很好的方式

,因为你可以控制验证。无论如何,如果您删除无效的字符,是否有可能丢失某些数据?b
?怎么样抛出

例外?


真诚的,


Luke Zhang


Microsoft在线社区支持

================================== ================

通过电子邮件收到我的帖子通知?请参阅
http://msdn.microsoft .com / subscripti ... ult.aspx#notif

ications。


注意:MSDN托管新闻组支持服务是针对非紧急问题

如果社区或微软支持人员在1个工作日内做出初步回复是可以接受的。请注意,每个跟随

的响应可能需要大约2个工作日作为支持

专业人士与您合作可能需要进一步调查才能达到

最有效的分辨率。该产品不适用于需要紧急,实时或基于电话的交互或复杂的b $ b项目分析和转储分析问题的情况。这种性质的问题最好通过联系

Microsoft客户支持服务(CSS)处理
href =http://msdn.microsoft.com/subscriptions/support/default.aspx\"target =_ blank> http://msdn.microsoft.com/subscripti...t/default.aspx

======================================== ==========


此帖子按原样提供。没有保证,也没有授予任何权利。

Hello,

The data type "char" in C# is for 16-bit Unicode character, and its range
is from U+0000 to U+ffff. Therefore, the following line may be not
necessary in your code:

(c 0x10000 && c < 0x10ffff))

It has been beyond the range of C# char, and we won''t get such a value in
C# application.

When load strings or file into XMLDocument element, the charactors will be
valid and exceptions will be thrown if there is any invalid charactors.
Your function will check the string before this. I think this is a good way
since you can control the validation. Anyway, is it possible that some data
will be lost if you just remove the invalid charactors? How about throw an
exception?

Sincerely,

Luke Zhang

Microsoft Online Community Support
==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscripti...ult.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscripti...t/default.aspx.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.


数据类型char在C#中是16位Unicode字符,其范围
The data type "char" in C# is for 16-bit Unicode character, and its range

是从U + 0000到U + ffff。因此,您的代码中可能不需要以下行:


(c 0x10000&& c< 0x10ffff))


它超出了C#char的范围,我们不会在

C#应用程序中获得这样的价值。
is from U+0000 to U+ffff. Therefore, the following line may be not
necessary in your code:

(c 0x10000 && c < 0x10ffff))

It has been beyond the range of C# char, and we won''t get such a value in
C# application.



C#使用UTF-16,因此它可以使用

代理来覆盖所有Unicode范围(最多U + 10FFFF)。


请参阅 http:// www。 unicode.org/faq/utf_bom.html#UTF16

http://mailman.ic.ac.uk/pipermail/xm...er/014933.html

-

Mihai Nita [微软MVP,Windows - SDK]
http://www.mihai-nita.net

-------------------- ----------------------

将_year_替换为_以获取真实的电子邮件

C# uses UTF-16, so it can cover all Unicode range (up to U+10FFFF) using
surrogates.

See http://www.unicode.org/faq/utf_bom.html#UTF16
and http://mailman.ic.ac.uk/pipermail/xm...er/014933.html

--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email


这篇关于有效字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆