通过RFC 5987处理带有空格的filename *参数会在文件名中生成“+” [英] handling filename* parameters with spaces via RFC 5987 results in '+' in filenames

查看:348
本文介绍了通过RFC 5987处理带有空格的filename *参数会在文件名中生成“+”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些我正在处理的遗留代码(所以我不能只使用带有编码文件名组件的URL),允许用户从我们的网站下载文件。由于我们的文件名通常有许多不同的语言,因此它们都存储为UTF-8。我写了一些代码来处理RFC5987转换为正确的文件名*参数。这非常有效,直到我的文件名包含非ascii字符空格。根据RFC,空格字符不是attr_char的一部分,因此它被编码为%20。我有新版本的Chrome和Firefox,他们都在下载时转换为%20到+。我试过不编码空格并将编码的文件名放在引号中并获得相同的结果。我已经嗅到了来自服务器的响应,以验证servlet容器没有弄乱我的标题,它们看起来对我来说是正确的。 RFC甚至包含%20的示例。我错过了什么,或者所有这些浏览器都有与此相关的错误?

I have some legacy code I am dealing with (so no I can't just use a URL with an encoded filename component) that allows a user to download a file from our website. Since our filenames are often in many different languages they are all stored as UTF-8. I wrote some code to handle the RFC5987 conversion to a proper filename* parameter. This works great until I have a filename with non-ascii characters and spaces. Per RFC, the space character is not part of attr_char so it gets encoded as %20. I have new versions of Chrome as well as Firefox and they are all converting to %20 to + on download. I have tried not encoding the space and putting the encoded filename in quotes and get the same result. I have sniffed the response coming from the server to verify that the servlet container wasn't mucking with my headers and they look correct to me. The RFC even has examples that contain %20. Am I missing something, or do all of these browsers have a bug related to this?

非常感谢提前。我用来编码文件名的代码如下。

Many thanks in advance. The code I use to encode the filename is below.

Peter

public static boolean bcsrch(final char[] chars, final char c) {
    final int len = chars.length;
    int base = 0;
    int last = len - 1; /* Last element in table */
    int p;

    while (last >= base) {
        p = base + ((last - base) >> 1);

        if (c == chars[p])
            return true; /* Key found */
        else if (c < chars[p])
            last = p - 1;
        else
            base = p + 1;
    }

    return false; /* Key not found */
}

public static String rfc5987_encode(final String s) {
    final int len = s.length();
    final StringBuilder sb = new StringBuilder(len << 1);
    final char[] digits = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
    final char[] attr_char = {'!','#','$','&','\'','+','-','.','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','^','_','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','|', '~'};
    for (int i = 0; i < len; ++i) {
        final char c = s.charAt(i);
        if (bcsrch(attr_char, c))
            sb.append(c);
        else {
            final char[] encoded = {'%', 0, 0};
            encoded[1] = digits[0x0f & (c >>> 4)];
            encoded[2] = digits[c & 0x0f];
            sb.append(encoded);
        }
    }

    return sb.toString();
}

更新

下面是我的评论中提到的带有空格的中文字符文件的下载对话框的屏幕截图。

Here is a screen shot of the download dialog I get for a file with Chinese characters with spaces as mentioned in my comment.

推荐答案

正如朱利安在评论中指出的那样,我做了一个新手Java错误而忘记了我的字符到字节转换(因此我编码了字符的代码点而不是字符的字节表示),因此编码完全不正确。 RFC 5987中明确提到了这一点。我将发布用于进行转换的更正代码。一旦编码正确,浏览器就会正确识别文件名*参数,并且用于下载的文件名是正确的。

So as Julian pointed out in the comments, I made a rookie Java error and forgot to do my character to byte conversion (thus I encoded the character's codepoint instead of the character's byte representation), hence the encoding was completely incorrect. This is clearly mentioned as a requirement in RFC 5987. I will be posting corrected code for doing the conversion. Once the encoding is correct, the filename* parameter is recognized properly by the browser and the filename used for the download is correct.

以下是更正后的转义代码字符串的UTF-8字节。给我带来麻烦的文件名,现在正确编码如下:

Below is the corrected escaping code which operates on the UTF-8 bytes of the string. The filename that was giving me trouble, now properly encoded looks like this:

Content-Disposition:attachment; filename * = UTF-8''Museum%20%E5%8D%9A%E7%89%A9%E9%A6%86.jpg

Content-Disposition:attachment; filename*=UTF-8''Museum%20%E5%8D%9A%E7%89%A9%E9%A6%86.jpg

public static String rfc5987_encode(final String s) throws UnsupportedEncodingException {
    final byte[] s_bytes = s.getBytes("UTF-8");
    final int len = s_bytes.length;
    final StringBuilder sb = new StringBuilder(len << 1);
    final char[] digits = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
    final byte[] attr_char = {'!','#','$','&','+','-','.','0','1','2','3','4','5','6','7','8','9',           'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','^','_','`',                        'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','|', '~'};
    for (int i = 0; i < len; ++i) {
        final byte b = s_bytes[i];
        if (Arrays.binarySearch(attr_char, b) >= 0)
            sb.append((char) b);
        else {
            sb.append('%');
            sb.append(digits[0x0f & (b >>> 4)]);
            sb.append(digits[b & 0x0f]);
        }
    }

    return sb.toString();
}

这篇关于通过RFC 5987处理带有空格的filename *参数会在文件名中生成“+”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆