从InputStream保存临时pdf文件的Azure函数已损坏 [英] Azure function saving temp pdf file from inputstream is corrupted

查看:97
本文介绍了从InputStream保存临时pdf文件的Azure函数已损坏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将pdf上传到blob存储,通过MS Azure资源管理器下载该文件绝对可以.

I have uploaded a pdf to a blob storage which when downloaded through MS Azure Explorer is absolutely fine.

我有一个由队列触发的Azure函数,并且具有绑定到在队列消息中命名的Blob的输入.

I have an Azure function that get's triggered by a queue and also has an input binding to a blob which is named in the queue message.

当我将传入的Blob写入磁盘时,大小增加了一倍. pdf也损坏,无法在pdf阅读器中打开.在记事本中打开时,字符与原始文件中显示的字符不同.似乎是一个编码问题,但我们正在处理字节而不是文本,因此不确定为什么会发生这种情况.

When I write the incoming blob to disk, the size is doubled. Also the pdf is corrupt and can't be opened in a pdf-reader. When opened in notepad the characters are different from what appears in the original file. Seems like an encoding issue but we are dealing with bytes and not text, so not sure why this is happening.

这是我的代码(使用python 3):

Here is my code (using python 3):

import azure.functions as func
import tempfile
import os.path

def main(msg: func.QueueMessage, inputblob: func.InputStream, outputTable: func.Out[str]) -> None:

    with tempfile.TemporaryDirectory() as td:
        f_name1 = os.path.join(td, "old.pdf")
        with open(f_name1, 'wb') as fh:
            fh.write(inputblob.read())

推荐答案

是的,它看起来很破损,前几个字节已更改,也许更多( marvin3.jpg 是Blob存储中的源图像).

Yes, this looks broken, the first few bytes are altered, maybe more (marvin3.jpg is the source image in blob storage).

作为一种解决方法,只需将其添加到您的function.json blob输入绑定中即可:

As a workaround, just add this to your function.json blob input binding:

"dataType": "binary"

如:

{
  "name": "inputBlob",
  "type": "blob",
  "dataType": "binary",
  "direction": "in",
  "path": "images/input_image.jpg",
  "connection": "AzureWebJobsStorage"
}

(这仅适用于JavaScript工作者),但我想SDK中存在一个错误,无法推断正确的类型.

You shouldn't need to put that in (it's only needed for the JavaScript worker) but i guess there's a bug somewhere in the SDK that prevents the right type being inferred.

完整的示例:

def main(req: func.HttpRequest, inputBlob: func.InputStream) -> func.HttpResponse:
    blob = inputBlob.read()

    with open("out.jpg", "wb") as outfile:
        outfile.write(blob)

    return func.HttpResponse(
            "Done. Binary data written to out.jpg",
            status_code=200
        )

.

This end to end test they have in the Python worker repo also seems to suggest that "dataType": "binary" should be there when using blob input bindings (no matter the file type you should get bytes).

如果您尝试将输入Blob强制转换为inputBlob: bytes而不是inputBlob: func.InputStream,那么如果您未指定dataType,则问题会变得更加明显:

If you're trying to cast the input blob as inputBlob: bytes instead of inputBlob: func.InputStream, the problem becomes more apparent if you don't have dataType specified:

Exception: TypeError: a bytes-like object is required, not 'str'

Python工作者会给您返回一个字符串,而不是字节.

The Python worker gives you back a string instead of bytes.

我已经在此处打开了一个问题,以更新文档.

I have opened an issue here for the docs to be updated.

这篇关于从InputStream保存临时pdf文件的Azure函数已损坏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆