System.Uri 在 .NET 4.5+ 中删除 Unicode RLM(从右到左标记;U+200F)字符 [英] System.Uri drops Unicode RLM (Right-to-Left Mark; U+200F) character in .NET 4.5+

查看:29
本文介绍了System.Uri 在 .NET 4.5+ 中删除 Unicode RLM(从右到左标记;U+200F)字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

using System;

namespace UnicodeRlm
{
    class Program
    {
        static void Main(string[] args)
        {
            var uri = new Uri(
                "https://example.com/attachments/The title is \"مفتاح معايير الويب!‏\" in Arabic.pdf");
            Console.WriteLine(uri.AbsolutePath);
            Console.WriteLine(uri.AbsolutePath.Length);
        }
    }
}

在 .NET 4.0 下,这会产生

Under .NET 4.0, this produces

/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%E2%80%8F%22%20in%20Arabic.pdf
168

在 .NET 4.5+ 下,这会产生

Under .NET 4.5+, this produces

/attachments/The%20title%20is%20%22%D9%85%D9%81%D8%AA%D8%A7%D8%AD%20%D9%85%D8%B9%D8%A7%D9%8A%D9%8A%D8%B1%20%D8%A7%D9%84%D9%88%D9%8A%D8%A8!%22%20in%20Arabic.pdf
159

.NET 4.5 删除了 %E2%80%8F 部分,即 RLM 字符:

.NET 4.5 drops the %E2%80%8F part, which is the RLM character:

...!%E2%80%8F%22%20in%20Arabic.pdf
...!%22%20in%20Arabic.pdf

我假设这是由 System.Uri 转义现在支持 RFC 3986,但是我的 RFC-fu 和 Unicode-fu 不知道这个 RFC 是否需要 RLM 被丢弃或枯萎这个 RLM 字符完全正确地放置在原始字符串中.

I have a hypothesis that this is caused by System.Uri escaping now supports RFC 3986, but my RFC-fu and Unicode-fu are failing me as to whether this RFC requires RLM to be dropped or wither this RLM character is placed correctly at all in the original string.

我不完全确定这是否是正确的行为标准,但对我来说肯定不是,因为我无法在 .NET 4.5 中下载名称中带有 RLM 字符的文件,也不能使用 WebClient 也不使用 HttpWebRequest.

I'm not entirely sure whether this is the correct behavior standards-wise, but for me it's certainly not since I cannot download a file with an RLM character in the name in .NET 4.5 neither with WebClient nor with HttpWebRequest.

有什么办法可以解决这个怪癖吗?

Is there any way to work around this quirk?

推荐答案

在 .Net 4.5 中默认启用国际资源标识符支持.当以 .Net 4.7.2 为目标时,从右到左的标记似乎再次受到尊重,这可能表明存在错误.

In .Net 4.5 International Resource Identifier support was enabled by default. When targeting .Net 4.7.2 the right-to-left mark seems to be honored again, this could indicate there was a bug.

如果项目需要面向.Net 4.5,这篇文章 可以帮助解决这个问题.

If the project needs to target .Net 4.5, the method ToggleIDNIRISupport in this post can help to overcome the issue.

像这样调用方法:

ToggleIDNIRISupport(false);

在调用此方法后构造 URI 时,它包含从右到左的标记.

When constructing the URI after this method call, it contains the right-to-left mark.

这篇关于System.Uri 在 .NET 4.5+ 中删除 Unicode RLM(从右到左标记;U+200F)字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆