.NET DataSet.GetXml() - 什么是默认编码? [英] .NET DataSet.GetXml() - what's the default encoding?

查看:133
本文介绍了.NET DataSet.GetXml() - 什么是默认编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现有应用程序将XML传递到SQLServer 2000中的sproc,输入参数数据类型为TEXT;
该XML派生自Dataset.GetXML()。但是我注意到它没有指定编码。

Existing app passes XML to a sproc in SQLServer 2000, input parameter data type is TEXT; The XML is derived from Dataset.GetXML(). But I notice it doesn't specify an encoding.

所以当用户潜入数据集中不合适的字符时,特别是ASCII 146(看来是撇号)而不是ASCII 39(单引号),sproc

So when the user sneaks in an inappropriate character into the dataset, specifically ASCII 146 (which appears to be an apostrophe) instead of ASCII 39 (single quote), the sproc fails.

一种方法是在GetXML的结果前缀

One approach is to prefix the result of GetXML with

<?xml version="1.0" encoding="ISO-8859-1"?>

在这种情况下可以工作,但是确保sproc不会崩溃的更正确的方法是什么(如果其他不可预见的字符弹出)?

It works in this case, but what would be a more correct approach to ensure the sproc does not crash (if other unforeseen characters pop up)?

PS。我怀疑用户正在MS-Word或类似编辑器中键入文本,并复制&粘贴到应用程序的输入字段中;我可能想允许用户继续这样工作,只需要防止崩溃。

PS. I suspect the user is typing text into MS-Word or similar editor, and copy & pasting into the input fields of the app; I would probably want to allow the user to continue working this way, just need to prevent the crashes.

编辑:我正在寻找确认或拒绝几个方面的答案,例如:
- 根据标题,如果XML中没有指定,那么什么是默认编码?_
- 正确使用的是编码ISO-8859-1 ?_
- 如果有更好的编码,将涵盖更多的英语世界的字符,因此不太可能导致sproc中的错误?_
- 你会过滤在应用程序的UI级别为标准ASCII(仅限0到127),不允许扩展ASCII?_
- 任何其他相关详细信息。

I am looking for answers that confirm or deny a few aspects, For example:
- as per title, whats the default encoding if none specified in the XML?
- Is the encoding ISO-8859-1 the right one to use?
- if there a better encoding that would encompass more characters in the english-speaking world and thus less likely to cause an error in the sproc?
- would you filter at the app's UI level for standard ASCII (0 to 127 only), and not allow extended ASCII?
- any other pertinent details.

推荐答案

DataSet.GetXml()返回一个字符串。在.NET中,字符串使用UTF-16进行内部编码,但这并不真实。

DataSet.GetXml() returns a string. In .NET, strings are internally encoded using UTF-16, but that is not really relevant here.

为什么没有<?字符串中的xml encoding = ...> 声明是因为该声明仅在字节流中解析XML才有用或需要。 .NET字符串不是字节流,它只是具有明确定义的代码点语义(即Unicode)的文本,因此不需要。

The reason why there's no <?xml encoding=...> declaration in the string is because that declaration is only useful or needed to parse XML in a byte stream. A .NET string is not a byte stream, it's just text with well-defined codepoint semantics (which is Unicode), so it is not needed there.

如果有没有XML编码声明,在没有BOM的情况下,由XML解析器假设UTF-8。然而,在你的情况下,它也是完全不相关的,因为问题不在于XML解析器(当SQL Server存储在 TEXT 列中时,SQL不解析XML) 。问题是您的XML包含一些Unicode字符,而 TEXT 是非Unicode SQL类型。

If there is no XML encoding declaration, UTF-8 is to be assumed by the XML parser in the absence of BOM. In your case, however, it is also entirely irrelevant since the problem is not with an XML parser (XML isn't parsed by SQL Server when it's stored in a TEXT column). The problem is that your XML contains some Unicode characters, and TEXT is a non-Unicode SQL type.

您可以使用 Encoding.GetBytes()方法将字符串编码为任何编码。

You can encode a string to any encoding using Encoding.GetBytes() method.

这篇关于.NET DataSet.GetXml() - 什么是默认编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆