在浏览器下载中保留UTF-8 BOM [英] Preserve UTF-8 BOM in Browser Downloads

查看:171
本文介绍了在浏览器下载中保留UTF-8 BOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个JAX-RS REST服务,该服务生成一个CSV文件并将其流回浏览器.一切都设置为UTF-8,因此我通过浏览器下载的文件也是有效的UTF-8文件(无BOM),该文件向我显示了有效,可读的UTF-8变音符号等,在Notepad ++,Sublime等中.

I have a JAX-RS REST-Service that produces a CSV file and streams it back to the browser. Everything is set to UTF-8, so also the file I download via the browser is a valid UTF-8 File (without a BOM) that shows me valid, readable UTF-8 umlauts, etc. in Notepad++, Sublime, etc..

在Excel中打开这样的文件虽然会导致无法读取的变音符号等,因为Excel显然试图使用另一个字符集(CP-1252,我想是打开它,但这并不重要).

Opening such a file in Excel though leads to unreadable umlauts, etc. since Excel apparently tries to open it with another charset (CP-1252, I guess, but that doesn't really matter).

通过Notepad ++使用BOM表保存文件并在Excel中重新打开该文件效果很好.好像检测BOM是Excel用于检测UTF-8的唯一方法.无论如何-我认为添加BOM可以有所帮助...

Saving the file with a BOM via Notepad++ and re-opening it in Excel works nicely. Seems like the detection of a BOM is the only way that Excel uses to detect UTF-8. Anyways - I thought that adding a BOM could help...

是的.结果相同.过了一会儿,我发现在某些情况下BOM会被删除:如果在BOM之前添加了任何字符,我会在Hex-Editor中看到BOM.删除该字符后,物料清单将不再存在.

Did that. Same result. After a while, I figured out that the BOM gets removed under some circumstances: If I added any character right before the BOM, I could see the BOM in my Hex-Editor. After removal of that character, the BOM wouldn't be there anymore.

当我继续并通过cURL下载文件时,我真的很惊讶. BOM在那里!直到我认为这可能与我的应用程序,内容类型,Encodigs,HTTP标头等有关–但它们似乎都没问题.

When I went on and downloaded the file via cURL I was really surprised. The BOM was there! Up until that I thought it might have to do with my application, Content-Types, Encodigs, HTTP Headers, etc. - but all of them seem to be fine.

现在,经过数小时的尝试,我对如何告诉浏览器保留BOM有了任何想法?我可以设置任何HTTP标头吗?由于Chrome,Internet Explorer,Edge,Firefox都删除了BOM,所以这听起来有点像浏览器的约定...

Now, after hours of trying out different things, any ideas on how I can tell the browser to keep the BOM? Is there any HTTP Header I could set? Since Chrome, Internet Explorer, Edge, Firefox all remove the BOM, this sounds a little bit like a browser convention to me...

非常感谢您的高度赞赏!

Many thanks for your highly appreciated help!

多亏了sideshowbarker的回答,我找到了一种解决方法,即在内容前添加两个BOM,因此在浏览器删除第一个BOM后将剩下一个BOM.

Thanks to sideshowbarker answer, I found a workaround by prepending two BOMs to the content, so there will be a BOM remaining after the first BOM gets removed by the browser.

推荐答案

我认为这可能是因为相关规范要求剥离BOM,而这正是浏览器所要做的.也就是说,浏览器符合编码规范中的UTF-8解码算法的要求. ,这是

I think this might be because the relevant specs require the BOM to be stripped out, and that’s what browsers do. That is, browsers conform to the requirements of the UTF-8 decode algorithm in the Encoding spec, which is this:

要以UTF-8解码字节流 stream ,请运行以下步骤:

To UTF-8 decode a byte stream stream, run these steps:

  1. 缓冲区为空字节序列.

stream 读取三个字节到 buffer .

如果 buffer 与0xEF 0xBB 0xBF不匹配,则将 buffer 附加到 stream .

If buffer does not match 0xEF 0xBB 0xBF, prepend buffer to stream.

输出为代码点流.

使用 stream output 运行UTF-8的解码器.

Run UTF-8’s decoder with stream and output.

返回输出.

步骤3是导致BOM剥离的原因.

Step 3 is what causes the BOM to be stripped.

鉴于编码规范要求,我认为无法告诉浏览器保留BOM.

Given the Encoding spec requires that, I think there’s no way to tell browsers to keep the BOM.

这篇关于在浏览器下载中保留UTF-8 BOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆