在云中读取Azure Blob(PDF)的最实用方法? [英] Most practical way to read an Azure Blob (PDF) in the Cloud?

查看:106
本文介绍了在云中读取Azure Blob(PDF)的最实用方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个初学者,以前从未处理过基于云的解决方案.

I'm somewhat of a beginner and have never dealt with cloud-based solutions yet before.

我的程序使用PDFBox库从PDF提取数据,然后根据数据重命名文件.目前所有这些都是本地的,但最终将需要部署为Azure功能. PDF将存储在Azure Blob容器中-Azure Functions的Azure Blob存储触发器是选择此选项的重要原因.

My program uses the PDFBox library to extract data from PDFs and rename the file based on the data. It's all local currently, but eventually will need to be deployed as an Azure Function. The PDFs will be stored in an Azure Blob Container - the Azure Blob Storage trigger for Azure Functions is an important reason for this choice.

我当然可以在本地下载blob并读取它,但是该程序应仅在Cloud中运行.我尝试直接使用Java读取blob,但这导致数据混乱,并且与PDFbox不兼容.我现在的计划是将文件临时存储在云中的其他位置(例如OneDrive,Azure文件存储),然后尝试从那里打开它们.但是,这似乎可以很快变成一个过于混乱的解决方案.我的问题:

Of course I can download the blob locally and read it, but the program should run solely in the Cloud. I've tried reading the blobs directly using Java, but this resulted in gibberish data and wasn't compatible with PDFbox. My plan for now is to temp store the files elsewhere in the Cloud (e.g. OneDrive, Azure File Storage) and try opening them from there. However, this seems like it can quickly turn into an overly messy solution. My questions:

(1)有什么方法可以将blob作为文件而不是CloudBlockBlob打开,因此不需要此附加步骤吗?

(1) Is there any way a blob can be opened as a File, rather than a CloudBlockBlob so this additional step isn't needed?

(2)如果否,在这种情况下推荐的临时存储是什么?

(2) If no, what would be a recommended temporary storage be in this case?

(3)是否有其他方法可以解决此问题?

(3) Are there any alternative ways to approach this issue?

推荐答案

由于您正在计划Azure功能,因此可以使用

Since you are planning Azure function, you can use blob trigger/binding to get the bytes directly. Then you can use PDFBox PdfDocument load method to directly build the object PDDocument.load(content). You won't need any temporary storage to store the file to load that.

@FunctionName("blobprocessor")
public void run(
  @BlobTrigger(name = "file",
               dataType = "binary",
               path = "myblob/{name}",
               connection = "MyStorageAccountAppSetting") byte[] content,
  @BindingName("name") String filename,
  final ExecutionContext context
) {
  context.getLogger().info("Name: " + filename + " Size: " + content.length + " bytes");
  PDDocument doc = PDDocument.load(content);
  // do your stuffs
}

这篇关于在云中读取Azure Blob(PDF)的最实用方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆