查询字符串编码/解码 [英] query string encoding/decoding

查看:107
本文介绍了查询字符串编码/解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经运行了一些简单的测试,看看如何在asp.net中处理查询字符串编码/解码,看起来情况比在asp中更糟糕...不能说我想想改进的大部分内容,但也许有人可以指出我正确的方向...


首先,看起来asp.net会自动读取和识别查询字符串以utf8和16位unicode编码,只有后者是一些只能在IIS中工作的突变非标准编码机制(例如%u00f1)。这看起来像是解码查询字符串的唯一方法。太糟糕了,因为那里的浏览器可以用各种不同的方式对它们进行编码,而默认情况下大多数工作方式都是windows-1252。至少在旧的asp中,将表单放在一起的惰性默认值和大多数浏览器的默认行为都很合适。似乎在asp.net中需要更加积极的关注


其次,它似乎不再使用页面的声明输出编码作为解释输入的方法(无论好坏,我猜)。这意味着如果它遇到查询字符串中的'*'*编码utf-8或者突变体的字符,它只会将该字符从输入中删除。期。无法处理它。例如,大多数浏览器将在1252中编码的重音西班牙语字符(即%f1表示?±)从asp.net环境中消失


第三,在asp中,当如果查询字符串变量名具有多个值,则引用Request对象会为您提供集合。现在该集合有一个toString方法,该方法以逗号分隔的值列表,但您*可以*分别引用每个不同的值。在asp.net中,NameValueCollection将多个值混合成一个逗号分隔的字符串,所以如果你的输入中有逗号,那就太糟糕了


四,在asp中,Request.QueryString给你查询字符串的原始urlencoded字节,即您发送的内容。在asp.net中,它实际上是后解释值的重新编码,所以你不能得到你所得到的...当你得到一个以utf-8编码的查询时,这是最烦人的;引用Request.QueryString返回一个以突变体%uxxxx语法编码的值,而不是asp.net


我只是在asp.net中弄湿了,来自asp环境。关于如何处理查询字符串问题的指针比asp.net中的默认值更好?好像在asp.net中有一些倒退的步骤


感谢

-mar

解决方案

嗨Mark,


感谢您在社区中发帖!

从你的描述中,你想知道表示ASP.NET处理

请求的查询字符串,它与经典ASP的字符串完全不同。

至于你提到的前两点,我认为他们是因为

查询字符串是根据其客户端broswer的代码页编码的,然后将

发布到服务器端。服务器端将通过页面的

代码页解码查询字符串。如果没有指定(两个clientbrowser用于服务器端页面),他们将会b / b
采用默认代码页,如果不等于,我们得到的结果可能会变成

不正确。 />

至于你提到的#3,我已经搜索了MSDN,发现在

ASP.NET中如果你想获得多值查询字符串项,你需要先用

使用Querystring.GetValues方法,这里是MSDN中的描述:

---------------- ----------------------------

如果您访问的项目只包含指定<的一个值br />
键,您无需修改​​代码。但是,如果给定键有多个

值,则需要使用其他方法返回

值集合。另请注意,Visual Basic .NET中的集合从零开始是
,而VBScript中的集合是基于一的。


例如,在ASP中请求中的各个查询字符串值
http ://localhost/myweb/valuetest.asp...s=10& values = 20 将被访问

如下:


<%

''这将输出10

Response.Write Request.QueryString(" values")(1)

>
''这将输出20

Response.Write Request.QueryString(" values")(2)

%>


在ASP.NET中,QueryString属性是来自

的NameValueCollection对象,在检索之前需要检索Values集合

你想要的实际项目。再次注意,集合中的第一项是使用零而不是一个索引检索的




<%

''这将输出" 10"

Response.Write(Request.QueryString.GetValues(" values")(0))


''这将输出'20"

Response.Write(Request.QueryString.GetValues(" values")(1))

%>


在ASP和ASP.NET的情况下,跟随代码的行为相同:


<%

''这将输出10,20

Response.Write(Request.QueryString(" values"))

%>


--------------------------------- -------------------------


至于#4点,我认为这样的事情是% uxxxx是因为

查询字符串在url和url只能包含ISO-8859-1字符集

字符,所以如果包含unicode,它会'首先进行编码,然后按
urlencoded(替换某些特定字符)。我认为我们肯定会在查询字符串中获取它们在客户端输入的值,只要我们在客户端和客户端之间映射正确的代码页。 serversdie。


此外,这里有一些关于从ASP迁移到ASP.NET的技术文章:

#New ASP.NET Page Directives
http://msdn.microsoft.com/library/ en ... pagedirectives

asp?frame = true


#Migrating to ASP.NET:主要注意事项
http://msdn.microsoft.com/ library / en ... issues.asp?fra

me = true


#将商务服务器站点从ASP迁移到ASP.NET
htt p://msdn.microsoft.com/library/en...snetmig.asp?fr

ame = true


#Converting ASP到ASP.NET
http ://msdn.microsoft.com/library/en...sptoaspnet.asp

?frame = true


希望这些帮助。

问候,


Steven Cheng

微软在线支持


安全! www.microsoft.com/security

(此帖子按原样提供,不作任何保证,并且不授予

权利。)


在ASP.NET上获取预览whidbey
< a rel =nofollowhref =http://msdn.microsoft.com/asp.net/whidbey/default.aspxtarget =_ blank> http://msdn.microsoft.com/asp.net/whidbey /default.aspx


史蒂夫..


首先,感谢指针关于GetValues()。这似乎比NameValueCollection更接近旧方法,并且可以更有效地使用查询字符串

至于你提到的前两点,我认为它们是因为
查询字符串根据其客户端broswer的代码页进行编码,然后发布到服务器端。服务器端将通过页面的代码页解码查询字符串。如果没有指定(两个clientbrowser用于服务器端页面),他们将采用默认代码页,如果不是等于,我们得到的结果可能会变得不正确


.aspx中的默认代码页是否与.asp不同?你在.aspx中设置代码页的方式有所不同吗?我有一个小样本.aspx页面与heade

<%@ Language =" Jscript" CodePage = 1252 EnableSessionState =" False"%

但asp.net确实*不*似乎使用声明的代码页进行解码。它只解码utf-8和突变体utf-16声明。期。使用声明的代码页在查询字符串中编码的字符只是蒸发,因为asp.net没有正确解码它们


这确实会带来一些实际问题,因为大多数遗留页面都不会消失他们在客户端或服务器端声明代码页的方式。在.asp中,默认代码页基于系统设置,通常是windows-1252。这个懒惰的默认设置与99%的浏览器相匹配,这些浏览器也设置为默认为windows-1252,两者可以一起播放。由于asp.net似乎只能解码utf-8(和非标准的突变体),因此需要额外注意使客户端和服务器可靠地一起玩......这似乎是不必要的潜在问题,因为它似乎转向了asp.net


默认情况下让客户端和服务器运行良好特别重要因为

a)没有办法在GET上传递代码页用于url编码的内容

b)IE即使在POST上,默认情况下也不会将客户端的代码页重新发送回来。我写了几个小样本帖子。该表单从Content-Type标题和< Meta http-equiv =" Content-type">获取字符集。 html中的标题,它确实对该代码页中的表单值进行编码,但表单提交上的http请求*不*包含任何信息告诉服务器发布数据所在的代码页(似乎是IE中的缺陷,但可能是在任何浏览器中并不罕见)


asp.net不遵循/尊重任何常见设置(如asp)的事实似乎为bug创建了不必要的开放

关于第3点,如果asp.net的工作方式与asp相同,你知道吗,因为QueryString集合在被引用之前不会被砍掉吗?或者asp.net是否会解释查询字符串代码是否引用它?

至于#4点,我认为%uxxxx之类的东西是因为
查询字符串是url和url只能包含ISO-8859-1 charset
字符,因此如果包含unicode,它将首先被编码并且还会被urlencoded(替换某些特定字符)。我认为只要我们在客户端和服务器之间映射正确的代码页,我们肯定会在查询字符串中获取它们如何输入的值


我认为你错过了#4的观点。重点是,不,你*不要*回到客户投入的东西,这似乎是不可取的和随意的。如果用户输入了 http:// foo .com / test.aspx?query = a%C3%B1

(?±用utf-8代码页编码

<%Response .Write(Request.QueryString);%

输出

query = a%u00f1

(后解释值使用非标准的突变形式而不是原始的编码)这有点像xml断言,一个等价的表示与另一个一样好,但是这是在xml标准中编写的.as.net中asp.net的变化似乎任意和烦人(特别是因为它使用非标准的语法,只对某些版本的IE和IIS以及其他人没有意义)。我想我仍然可以通过查看rawUrl属性来获得真实的用户输入。我还没试过呢

我知道编码的东西很难。这就是为什么我们必须把我们自己的COM对象编写到inter预测asp中的查询字符串。大多数情况下我们编写它们有两个原因:

1)我们希望像其他网站一样,让页面根据用户输入处理查询字符串编码(比如带有表单值的google),以便人们可以以任何方式对事物进行编码,我们仍然可以阅读它们。


2)我还没有在asp.net中尝试过这个,但至少在asp中,还有一个令人遗憾的副作用,即Server.UrlEncode只会对页面输出编码中的内容进行编码。如果您有一个多层系统,您需要构建网址以调用下一层,您不能总是说下一层将采用最后一层编码中的网址。因此我们需要额外的灵活性才能在任何代码页中进行urlencode。


在大多数情况下,正如我所说,代码页处理似乎在asp中默认运行良好。有一些限制,但默认情况下,大多数页面与整个世界的效果非常好。您的默认页面将涵盖英语和所有拉丁语言而无需额外的努力。看来,Asp.net需要额外的非默认工作才能正确处理任何不是7位ascii的东西,这似乎是倒退了一步。这就是我所说的。请原谅我这么说,但看起来你实际上没有尝试过任何这些东西超过标准的ascii输入。如果你愿意,我可以发送你的样本页面和一些示例查询来证明我在说什么。


我会通过参考页面阅读,但是在前几页之后我似乎还没有任何东西可以纠正我已经得到的asp.net在代码页问题上比asp更差。


谢谢

-Mark


嗨Mark,


感谢您的回复。关于这个问题,我会就此问题咨询一些

的其他专家,并会尽快给你更新。另外,我认为,如果您在问题上附上一些示例页面,那将会很有帮助。

你提到过。谢谢。

问候,


Steven Cheng

微软在线支持


获取安全! www.microsoft.com/security

(此帖子按原样提供,不作任何保证,并且不授予

权利。)


在ASP.NET上获取预览whidbey
< a rel =nofollowhref =http://msdn.microsoft.com/asp.net/whidbey/default.aspxtarget =_ blank> http://msdn.microsoft.com/asp.net/whidbey /default.aspx


I''ve run a few simple tests looking at how query string encoding/decoding gets handled in asp.net, and it seems like the situation is even messier than it was in asp... Can''t say I think much of the "improvements", but maybe someone here can point me in the right direction...

First, it looks like asp.net will automatically read and recognize query strings encoded in utf8 and 16-bit unicode, only the latter is some mutant, non-standard encoding mechanism that only works in IIS (%u00f1 for example). This looks like it''s the *only* way to decode querystrings. Too bad, ''cause browsers out there can encode them all kinds of different ways, and the way most will get done by default is windows-1252. At least in old asp, the lazy defaults for putting together your forms and the default behavior for most browsers would fit well. Seems like there''s more active attention required in asp.net

Second, it no longer appears to be using the page''s declared output encoding as a means of interpreting the input (both good and bad, i guess). This means if it runs into a character in the querystring that''s *not* encoded utf-8 or mutant, it just drops that character out of the input. Period. No way to handle it. Accented spanish characters, for example, that most browsers are going to encode in 1252 (i.e. %f1 for ?±) just vanish from the asp.net environment

Third, in asp, when you have more than one value for a query string variable name, referencing the Request object gives you a collection. Now that collection has a toString method that makes a comma-separated list of the values but you *can* refer to each of the different values separately. In asp.net, the NameValueCollection mashes multiple values into a single comma-separated string so if your input has commas in it, well too bad

Fourth, in asp Request.QueryString gives you the original urlencoded bytes of the querystring i.e. what you were sent. In asp.net, it''s actually a re-urlencoding of the post-interpreted values, so you can''t get out what you got in... This is most annoying when you get a querysting encoded in utf-8; referencing Request.QueryString returns you a value encoded in the mutant %uxxxx syntax instead in asp.net

I''m just getting my feet wet in asp.net, coming from an asp environment. Any pointers on how to handle query string issues better than what appears to be the default in asp.net? Seems like there are some steps backwards in asp.net

Thank
-mar

解决方案

Hi Mark,

Thanks for posting in the community!
From your description, you''re wondering on the means ASP.NET treat the
Request''s querystring which seems quite different from the classic ASP''s.
As for the first two points you mentioned, I think they''re because the
querystring is encoded based on its client broswer''s codepage and then post
to serverside. The serverside will decode the querystring via the page''s
codepage. If not specified(both clientbrowser for serverside page), they''ll
take the default codepage, if not equals, the result we got may become
incorrect.

As for the #3 you mentioned , I''ve searched the MSDN and found that in
ASP.NET if you want to get Multi-value querystring item, you need to first
use Querystring.GetValues method, here is the description in MSDN:
--------------------------------------------
If the item you are accessing contains exactly one value for the specified
key, you do not need to modify your code. However, if there are multiple
values for a given key, you need to use a different method to return the
collection of values. Also, note that collections in Visual Basic .NET are
zero-based, whereas the collections in VBScript are one-based.

For example, in ASP the individual query string values from a request to
http://localhost/myweb/valuetest.asp...s=10&values=20 would be accessed
as follows:

<%
''This will output "10"
Response.Write Request.QueryString("values")(1)

''This will output "20"
Response.Write Request.QueryString("values")(2)
%>

In ASP.NET, the QueryString property is a NameValueCollection object from
which you would need to retrieve the Values collection before retrieving
the actual item you want. Again, note the first item in the collection is
retrieved by using an index of zero rather than one:

<%
''This will output "10"
Response.Write (Request.QueryString.GetValues("values")(0))

''This will output "20"
Response.Write (Request.QueryString.GetValues("values")(1))
%>

In both the case of ASP and ASP.NET, the follow code will behave
identically:

<%
''This will output "10", "20"
Response.Write (Request.QueryString("values"))
%>

----------------------------------------------------------

As for the #4 point, I think such things as %uxxxx is because the
querystrings are in url and url can only contains ISO-8859-1 charset
characters, so if contains unicode, it''ll be first encoded and also
urlencoded( replace some particular characters). I think we''re certainly to
get the values in the querystring how they''re input at client as long as we
mapping the correct codepage between the client and serversdie.

In addition, here are some tech articles on Migrating from ASP TO ASP.NET:
#New ASP.NET Page Directives
http://msdn.microsoft.com/library/en...pagedirectives.
asp?frame=true

#Migrating to ASP.NET: Key Considerations
http://msdn.microsoft.com/library/en...issues.asp?fra
me=true

#Migrating a Commerce Server Site from ASP to ASP.NET
http://msdn.microsoft.com/library/en...snetmig.asp?fr
ame=true

#Converting ASP to ASP.NET
http://msdn.microsoft.com/library/en...sptoaspnet.asp
?frame=true

Hope these help.
Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)

Get Preview at ASP.NET whidbey
http://msdn.microsoft.com/asp.net/whidbey/default.aspx


Hi Steve..

First off, thanks for the pointer about GetValues(). That seems much closer to the old method than the NameValueCollection and makes it possible to work with querystrings more effectively

As for the first two points you mentioned, I think they''re because the
querystring is encoded based on its client broswer''s codepage and then post
to serverside. The serverside will decode the querystring via the page''s
codepage. If not specified(both clientbrowser for serverside page), they''ll
take the default codepage, if not equals, the result we got may become
incorrect
Is the default code page in .aspx different than .asp? Do you set the codepage differently in .aspx? I have a little sample .aspx page with the heade
<%@Language="Jscript" CodePage=1252 EnableSessionState="False"%
and yet asp.net does *not* seem to be decoding using the declared codepage. It only decodes utf-8 and the mutant utf-16 declarations. Period. Characters encoded in the querystring using the declared codepage just vaporize because asp.net is not decoding them properly

This does pose some real practical problems, since most legacy pages don''t go out of their way to declare codepages on either the client or the server side. In .asp, the default codepage is based on a system setting, usually windows-1252. This lazy default matches up well with 99% of the browsers out there, which are also set up to default to windows-1252 and the two can play together. Since asp.net seems only to decode utf-8 (and non-standard mutant), extra care seems to be necessary to get the client and server to play together reliably... An unnecessary potential gotcha for going to asp.net it seems

Having the client and server play well by default is especially important becaus
a) there is no way to communicate what the codepage is for url encoding on GET
b) I.E. doesn''t seem to send the client''s codepage back up for the ride by default even on POSTs. I wrote a couple of little sample posts. The form gets the charset both from the Content-Type header and a <Meta http-equiv="Content-type"> header in the html and it does encode the form values in that codepage but the http request on the form submission does *not* include any info to tell the server what codepage the post data is in (seems like a deficiency in IE, but probably not uncommon among any browsers)

The fact that asp.net doesn''t follow/respect any of the common settings like asp did seems like it creates unnecessary openings for bugs

On point #3, do you know off hand if asp.net works the same way as asp in that the QueryString collection doesn''t get chopped up until there''s a reference to it? Or is asp.net going to interpret the querystring whether or not the code references it?
As for the #4 point, I think such things as %uxxxx is because the
querystrings are in url and url can only contains ISO-8859-1 charset
characters, so if contains unicode, it''ll be first encoded and also
urlencoded( replace some particular characters). I think we''re certainly to
get the values in the querystring how they''re input at client as long as we
mapping the correct codepage between the client and serversdie



I think you missed the point of #4. The point was that no, you *don''t* get back what the client put in and that seems undesirable and arbitrary. If the user input i
http://foo.com/test.aspx?query=a%C3%B1
(the ?± is encoded with the utf-8 codepage
<% Response.Write (Request.QueryString); %
output
query=a%u00f1
(the post-interpreted value re-encoded using the non-standard mutant form instead of the original encoding) This is kinda like the xml assertion that one equivalent representation is just as good as another, but that''s codified in the xml standard. The change in asp.net from asp just seems arbitrary and annoying (especially since it uses a syntax that is non-standard and only makes sense to certain versions of IE and IIS and nobody else). I guess I can still get at the real user input by looking at the rawUrl property instead. I haven''t tried that yet
I know this encoding stuff is difficult. That''s why we had to write our own COM objects to interpret the querystring in asp. Mostly we wrote them for two reasons:
1) we wanted, like other websites do, to let the page handle the querystring encoding based on user input (like google with a form value) so that people could encode things any which way and we''d still be able to read them.

2) I haven''t tried this in asp.net yet, but at least in asp, there was also the unfortunate side-effect that Server.UrlEncode would only encode things in the page''s output-encoding. If you have a multi-tier system where you need to construct urls to call the next tier, you can''t always say the next tier will take urls in the encoding the last tier does. So we needed the extra flexibility to be able to urlencode in any codepage.

For the most part, as I said, the codepage handling seemed to work pretty well by default in asp. There were limitations, but by default most pages worked pretty well with the world at large. Your default page would cover English and all latinate languages without extra effort. Asp.net, it seems, requires extra non-default effort to correctly handle anything that''s not 7-bit ascii, and that seems like a step backwards. That''s all i''m saying. Pardon me for saying so, but it doesn''t seem like you''ve actually tried any of this stuff past the standard ascii input. If you want I can send you my sample page and some sample queries to demonstate what I''m saying.

I''ll read through the reference pages, but after the first few there still doesn''t appear to be anything to correct the impression I''ve gotten that asp.net is worse than asp in codepage issues.

Thanks
-Mark


Hi Mark,

Thank you for the response. Regarding on the issue, I''ll consult some
further experts on this and will update you as soon as posible. Also, I
think it''ll be helpful if you''d attach some sample pages on the issues
you''ve mentioned. Thanks.
Regards,

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)

Get Preview at ASP.NET whidbey
http://msdn.microsoft.com/asp.net/whidbey/default.aspx


这篇关于查询字符串编码/解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆