怎样正确地分析一个URI查询字符串到C#的名称,收藏价值? [英] How do I correctly parse a URI query string into a name-value collection in C#?

查看:95
本文介绍了怎样正确地分析一个URI查询字符串到C#的名称,收藏价值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用.NET 4.5和我想要一个URI查询字符串解析为的NameValueCollection 。正确的方法似乎是使用 HttpUtility.ParseQueryString(查询字符串)这需要从 Uri.Query 获得的字符串并返回一个的NameValueCollection Uri.Query 返回根据RFC 2396逃脱,而 HttpUtility.ParseQueryString(查询字符串)期望一个字符串字符串,URL编码。假设RFC 2396和URL编码都是一样的东西,这应该很好地工作。

I'm using .NET 4.5 and I'm trying to parse a URI query string into a NameValueCollection. The right way seems to be to use HttpUtility.ParseQueryString(string query) which takes the string obtained from Uri.Queryand returns a NameValueCollection. Uri.Query returns a string that is escaped according to RFC 2396, and HttpUtility.ParseQueryString(string query) expects a string that is URL-encoded. Assuming RFC 2396 and URL-encoding are the same thing, this should work fine.

然而,的 ParseQueryString >声称,它使用UTF8格式解析查询字符串。还有一个重载的方法,这需要 System.Text.Encoding ,然后使用该不是UTF8。

However, the documentation for ParseQueryString claims that it "uses UTF8 format to parse the query string". There is also an overloaded method which takes a System.Text.Encoding and then uses that instead of UTF8.

我的问题是:这是什么意思使用UTF8作为编码输入是一个字符串,它通过定义(在C#)是UTF-16。这怎么解释为UTF-8?有什么用UTF8和UTF16在这种情况下,编码之间的区别?我担心的是,因为我接受任意用户的输入,可能会有一定的安全隐患,如果我把事情弄糟的编码(即用户可能可以通过一些脚本漏洞滑倒)。

My question is: what does it mean to use UTF8 as the encoding? The input is a string, which by definition (in C#) is UTF-16. How is that interpreted as UTF-8? What is the difference between using UTF8 and UTF16 as the encoding in this case? My concern is that since I'm accepting arbitrary user input, there might be some security risk if I botch the encoding (i.e. the user might be able to slip through some script exploit).

有关于这个主题的先前的问题(如何解析一个查询字符串到.NET 一个的NameValueCollection)但它没有具体ADRESS编码问题。

There is a previous question on this topic (How to parse a query string into a NameValueCollection in .NET) but it doesn't specifically adress the encoding problem.

推荐答案

在解析时的编码的值,它把这些值作为UTF-8。就拿字符¢为例。在UTF-8编码为C2 A2。所以,如果它是在查询字符串,它会被编码为%C2%A2。

When parsing encoded values, it treats those values as UTF-8. Take the character ¢, for example. The UTF-8 encoding is C2 A2. So if it were in a query string, it would be encoded as %C2%A2.

现在,当 ParseQueryString 正在解码,它需要知道要使用的编码。默认为UTF-8,这意味着字符会被正确地解码。但也许是用户使用微软的西里尔代码页(Windows的1251),其中C2和A2两种不同的字符。在这种情况下,将其解释为UTF-8将是一个错误。

Now, when ParseQueryString is decoding, it needs to know what encoding to use. The default is UTF-8, meaning that the character would be decoded correctly. But perhaps the user was using Microsoft's Cyrillic code page (Windows-1251), where C2 and A2 are two different characters. In that case, interpreting it as UTF-8 would be an error.

如果这是一个用户界面的应用程序可能(即用户直接输入数据),则要使用任何编码为当前的UI文化定义。如果您收到来自网页这些信息,那么你会希望使用任何编码的页面使用。如果你正在编写一个Web服务,那么您可以告诉用户其输入必须是UTF-8编码。

If this is a user interface application (i.e. the user is entering data directly), then you probably want to use whatever encoding is defined for the current UI culture. If you're getting this information from Web pages, then you'll want to use whatever encoding the page uses. And if you're writing a Web service then you can tell the users that their input has to be UTF-8 encoded.

这篇关于怎样正确地分析一个URI查询字符串到C#的名称,收藏价值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆