如何在 XML (UTF-8) 中嵌入上传的二进制文件 (ASCII-8BIT)? [英] How do I embed an uploaded binary files (ASCII-8BIT) in an XML (UTF-8)?

查看:37
本文介绍了如何在 XML (UTF-8) 中嵌入上传的二进制文件 (ASCII-8BIT)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个通过常规 form_for 上传的文件,这在 params 哈希中给了我一个 ActionDispatch::Http::UploadedFile 对象,我可以调用它.read 获取内容.我现在需要将该文件嵌入到 XML 文档中.我现在使用常规的 Ruby 字符串来构建 XML.Rails 字符串的默认编码是 utf-8.

因此我收到错误 Encoding::UndefinedConversionError, "\x89" from ASCII-8BIT to UTF-8.

以下文件会发生这种情况:

<前>what-matters-now-1.pdf:应用程序/八位字节流;字符集=二进制示例.csv:文本/纯文本;字符集=utf-8调查.png:图像/png;字符集=二进制

它不会发生在:

my_test.txt:文本/纯文本;charset=us-ascii

我尝试更改编码,但出现相同的错误:

params[:file].read.encode('utf-8')

解决方案

首先,如果不进行某种文本转换,就不能在 XML 文档中嵌入二进制文件.至少需要以某种方式对 PDF 文档和 PNG 图像进行编码 - 可能 Base64 - 在您开始尝试将其内容视为字符串而不是字节序列之前.

UndefinedConversionError 表明您正在尝试将文本从 Ruby 认为是 ASCII 的文本转换为 UTF-8.但源文本包含一个字节,其值为 0x89(十进制 137),超出 ASCII 范围.如果源文件是二进制文件,这并不意外,并且 base64 编码将解决该问题.

但是,如果生成该错误的源文件已经是文本,那么您需要确定并指定它实际使用的字符集.0x89 表示它既不是 ASCII 也不是 UTF-8,因此最有可能的选项是 Latin-1 或 Windows-1252.

I have a file which is uploaded via a regular form_for, this gives me a ActionDispatch::Http::UploadedFile object in the params hash on which I can call .read to get the content. I now need to embed the file in an XML document. I'm using a regular Ruby string for now to construct the XML. The default encoding for a Rails string is utf-8.

Therefore I get the error Encoding::UndefinedConversionError, "\x89" from ASCII-8BIT to UTF-8.

This happens for the following files:

what-matters-now-1.pdf: application/octet-stream; charset=binary
example.csv: text/plain; charset=utf-8
investigations.png: image/png; charset=binary

It does not happen for:

my_test.txt: text/plain; charset=us-ascii

I have tried changing the encoding, but I get the same error:

params[:file].read.encode('utf-8')

解决方案

First, you cannot embed a binary file in an XML document without some sort of conversion to text. At least the PDF document and the PNG image need to be encoded somehow - probably Base64 - before you start trying to treat their contents as strings of characters instead of sequences of bytes.

The UndefinedConversionError indicates that you're trying to convert text into UTF-8 from what Ruby thinks is ASCII. But the source text includes a byte whose value is 0x89 (137 decimal), which is outside the ASCII range. That is not at all unexpected if the source file is a binary file, and base64-encoding it will fix that problem.

If, however, the source file generating that error is already text, then you need to determine and specify what character set it is actually using. The 0x89 indicates it is neither ASCII nor UTF-8, so the most likely options are Latin-1 or Windows-1252.

这篇关于如何在 XML (UTF-8) 中嵌入上传的二进制文件 (ASCII-8BIT)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆