难道HttpUtility.UrlEn code符合规范的'X WWW的形式,urlen codeD“? [英] Does HttpUtility.UrlEncode match the spec for 'x-www-form-urlencoded'?

查看:119
本文介绍了难道HttpUtility.UrlEn code符合规范的'X WWW的形式,urlen codeD“?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每MSDN

  

URLEn code转换的字符如下:

     
      
  • 空格()转换为加号(+)。
  •   
  • 非字母数字字符转义其十六进制重新presentation。
  •   

这是类似的,但不完全一样 W3C

  

应用程序/ x-WWW的形式urlen codeD

     

这是默认的内容类型。与此内容类型提交的表格必须连接codeD如下:

     
      
  1. 控件名称和值转义。空格字符被替换   由+,然后保留字符   如 RFC1738 描述的逃脱,   第2.2节:非字母数字   字符由%HH,取代了   百分号和两个十六进制   数字重新presenting的ASCII code   人物。换行符   再psented为CR LF对$ P $(即,   0D%0A%')

  2.   
  3. /值列的顺序控制的名字出现在   文件。这个名字是来自分离   通过'='和名称/值对的值   彼此相隔'和;

  4.   

 

我的问题是,有没有人做的工作,以确定URLEn code是否产生有效的X WWW的形式urlen codeD数据?

解决方案

那么,你的文件链接到针对IIS 6 Server.UrlEn code,但你的标题似乎要问.NET的 System.Web.HttpUtility.UrlEn code 。使用诸如反射器的工具,我们可以看到后者的实施,并确定它是否符合W3C规范。

下面是最终称为编码程序(注意,它被定义为一个字节数组,并使用字符串最终转换这些字符串,字节数组,并调用该方法的其他重载)。你会调用这个每个控件名称和值(以避免转义保留字符 =安培; 作为分隔符)。

 受保护的内部虚拟的byte [] UrlEn code(byte []的字节,诠释抵消,诠释计数)
{
    如果(!ValidateUrlEncodingParameters(字节,偏移,计数))
    {
        返回null;
    }
    INT NUM = 0;
    INT NUM2 = 0;
    的for(int i = 0; I<计数;我++)
    {
        焦炭CH =(char)的字节[偏移+ 1];
        如果(CH =='')
        {
            NUM ++;
        }
        否则,如果(!HttpEn coderUtility.IsUrlSafeChar(CH))
        {
            NUM2 ++;
        }
    }
    如果((NUM == 0)及及(NUM2 == 0))
    {
        返回字节;
    }
    byte []的缓冲区=新的字节[数+(NUM2 * 2)];
    INT num4 = 0;
    对于(INT J = 0; J<计数; J ++)
    {
        字节NUM6 =字节[偏移+ J]。
        焦炭CH2 =(焦炭)NUM6;
        如果(HttpEn coderUtility.IsUrlSafeChar(CH 2))
        {
            缓冲区[num4 ++] = NUM​​6;
        }
        否则,如果(CH =='')
        {
            缓冲区[num4 ++] = 0x2B访问;
        }
        其他
        {
            缓冲区[num4 ++] = 0x25;
            缓冲区[num4 ++] =(字节)HttpEn coderUtility.IntToHex((NUM6>> 4)及15条);
            缓冲区[num4 ++] =(字节)HttpEn coderUtility.IntToHex(NUM6及15条);
        }
    }
    返回缓冲区;
}

公共静态布尔IsUrlSafeChar(焦CH)
{
    如果((((CH> ='A')及及(CH< ='Z'))||((CH> ='A')及及(CH< ='Z ')))||((CH> ='0')及及(CH&其中; ='9')))
    {
        返回true;
    }
    开关(CH)
    {
        外壳 '(':
        外壳 ')':
        外壳 '*':
        外壳 '-':
        外壳 '。':
        外壳 '_':
        外壳 '!':
            返回true;
    }
    返回false;
}
 

例程计数字符的需要数量被替换(空格和非链接安全字符)的第一部分。例程的第二部分分配一个新的缓冲区,并执行替换:

  1. URL安全字符保持原样:<!code> AZ,az或0-9()* -._
  2. 空格被转换成加号
  3. 在所有其他字符转换为%HH

RFC1738国家(重点煤矿):

  

因此​​,仅字母数字,特殊字符$ -_ +!*'(),,和
  用于其保留用途保留字符的可以使用
  在URL中unen codeD。

     

在另一方面,不需要的字符被连接$ C $光盘
  (包括字母数字)可能是连接codeD内的方案,具体
  一个URL的一部分,只要它们不被用于保留
  目的。

这组地址由 UrlEn code 允许的安全特征是在RFC1738中定义的特殊字符的一个子集。也就是说,字符 $ 失踪,将EN codeD由 UrlEn code 即使规范说,他们是安全的。由于他们的可以的使用unen codeD(而不是的必须的),仍符合规范为en code他们(和第二款规定,明确)。

对于换行符,如果输入有一个 CR LF 序列,那么就会逃过%0D%0A 。但是,如果输入只有 LF 那么就会逃过%0A (所以没有换行不归在该例行程序)。

底线:它符合规范,而另外编码 $ ,并且主叫方负责提供合适的标准化换行符输入

Per MSDN

URLEncode converts characters as follows:

  • Spaces ( ) are converted to plus signs (+).
  • Non-alphanumeric characters are escaped to their hexadecimal representation.

Which is similar, but not exactly the same as W3C

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content type must be encoded as follows:

  1. Control names and values are escaped. Space characters are replaced by '+', and then reserved characters are escaped as described in RFC1738, section 2.2: Non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., '%0D%0A').

  2. The control names/values are listed in the order they appear in the document. The name is separated from the value by '=' and name/value pairs are separated from each other by '&'.

 

My question is, has anyone done the work to determine whether URLEncode produces valid x-www-form-urlencoded data?

解决方案

Well, the documentation you linked to is for IIS 6 Server.UrlEncode, but your title seems to ask about .NET System.Web.HttpUtility.UrlEncode. Using a tool like Reflector, we can see the implementation of the latter and determine if it meets the W3C spec.

Here is the encoding routine that is ultimately called (note, it is defined for an array of bytes, and other overloads that take strings eventually convert those strings to byte arrays and call this method). You would call this for each control name and value (to avoid escaping the reserved characters = & used as separators).

protected internal virtual byte[] UrlEncode(byte[] bytes, int offset, int count)
{
    if (!ValidateUrlEncodingParameters(bytes, offset, count))
    {
        return null;
    }
    int num = 0;
    int num2 = 0;
    for (int i = 0; i < count; i++)
    {
        char ch = (char) bytes[offset + i];
        if (ch == ' ')
        {
            num++;
        }
        else if (!HttpEncoderUtility.IsUrlSafeChar(ch))
        {
            num2++;
        }
    }
    if ((num == 0) && (num2 == 0))
    {
        return bytes;
    }
    byte[] buffer = new byte[count + (num2 * 2)];
    int num4 = 0;
    for (int j = 0; j < count; j++)
    {
        byte num6 = bytes[offset + j];
        char ch2 = (char) num6;
        if (HttpEncoderUtility.IsUrlSafeChar(ch2))
        {
            buffer[num4++] = num6;
        }
        else if (ch2 == ' ')
        {
            buffer[num4++] = 0x2b;
        }
        else
        {
            buffer[num4++] = 0x25;
            buffer[num4++] = (byte) HttpEncoderUtility.IntToHex((num6 >> 4) & 15);
            buffer[num4++] = (byte) HttpEncoderUtility.IntToHex(num6 & 15);
        }
    }
    return buffer;
}

public static bool IsUrlSafeChar(char ch)
{
    if ((((ch >= 'a') && (ch <= 'z')) || ((ch >= 'A') && (ch <= 'Z'))) || ((ch >= '0') && (ch <= '9')))
    {
        return true;
    }
    switch (ch)
    {
        case '(':
        case ')':
        case '*':
        case '-':
        case '.':
        case '_':
        case '!':
            return true;
    }
    return false;
}

The first part of the routine counts the number of characters that need to be replaced (spaces and non- url safe characters). The second part of the routine allocates a new buffer and performs replacements:

  1. Url Safe Characters are kept as is: a-z A-Z 0-9 ()*-._!
  2. Spaces are converted to plus signs
  3. All other characters are converted to %HH

RFC1738 states (emphasis mine):

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

On the other hand, characters that are not required to be encoded
(including alphanumerics) may be encoded within the scheme-specific
part of a URL, as long as they are not being used for a reserved
purpose.

The set of Url Safe Characters allowed by UrlEncode is a subset of the special characters defined in RFC1738. Namely, the characters $, are missing and will be encoded by UrlEncode even when the spec says they are safe. Since they may be used unencoded (and not must), it still meets the spec to encode them (and the second paragraph states that explicitly).

With respect to line breaks, if the input has a CR LF sequence then that will be escaped %0D%0A. However, if the input has only LF then that will be escaped %0A (so there is no normalization of line breaks in this routine).

Bottom Line: It meets the specification while additionally encoding $,, and the caller is responsible for providing suitably normalized line breaks in the input.

这篇关于难道HttpUtility.UrlEn code符合规范的'X WWW的形式,urlen codeD“?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆