使用 Spark 从 Azure Blob 读取数据 [英] Reading data from Azure Blob with Spark

查看:41
本文介绍了使用 Spark 从 Azure Blob 读取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在通过火花流从 azure blob 读取数据时遇到问题

I am having issue in reading data from azure blobs via spark streaming

JavaDStream<String> lines = ssc.textFileStream("hdfs://ip:8020/directory");

上面的代码适用于 HDFS,但无法从 Azure blob 读取文件

code like above works for HDFS, but is unable to read file from Azure blob

https://blobstorage.blob.core.windows.net/containerid/folder1/

以上是azure UI中显示的路径,但这不起作用,我是否遗漏了什么,我们如何访问它.

Above is the path which is shown in azure UI, but this doesnt work, am i missing something, and how can we access it.

我知道 Eventhub 是流式传输数据的理想选择,但我目前的情况需要使用存储而不是队列

I know Eventhub are ideal choice for streaming data, but my current situation demands to use storage rather then queues

推荐答案

为了从 blob 存储中读取数据,需要做两件事.首先,您需要告诉 Spark 在底层 Hadoop 配置中使用哪个本机文件系统.这意味着您还需要 Hadoop-Azure JAR在您的类路径中可用(请注意,与 Hadoop 系列相关的更多 JAR 可能存在运行时要求):

In order to read data from blob storage, there are two things that need to be done. First, you need to tell Spark which native file system to use in the underlying Hadoop configuration. This means that you also need the Hadoop-Azure JAR to be available on your classpath (note there maybe runtime requirements for more JARs related to the Hadoop family):

JavaSparkContext ct = new JavaSparkContext();
Configuration config = ct.hadoopConfiguration();
config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");
config.set("fs.azure.account.key.youraccount.blob.core.windows.net", "yourkey");

现在,使用 wasb:// 前缀调用文件(注意 [s] 用于可选的安全连接):

Now, call onto the file using the wasb:// prefix (note the [s] is for optional secure connection):

ssc.textFileStream("wasb[s]://<BlobStorageContainerName>@<StorageAccountName>.blob.core.windows.net/<path>");

不用说,您需要从进行 Blob 存储查询的位置设置适当的权限.

This goes without saying that you'll need to have proper permissions set from the location making the query to blob storage.

这篇关于使用 Spark 从 Azure Blob 读取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆