Spark：如何用scala生成从s3读取的文件路径 [英] Spark :How to generate file path to read from s3 with scala

查看：297 发布时间：2017/11/6 21:55:44 json scala apache-spark amazon-s3 filesystems

本文介绍了Spark：如何用scala生成从s3读取的文件路径的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在scala中生成和加载多个s3文件路径，以便我可以使用：

sqlContext.read.json （s3：//..../*/*/*）
我知道我可以使用通配符来读取多个文件，但有什么办法，以便我可以生成路径？例如我的结构如下所示：
BucketName / year / month / day / files

s3：// testBucket / 2016/10/16 / part00000
这些文件都是jsons。问题是我需要加载文件的空间持续时间，例如。说16天，然后我需要loado文件的开始一天（十六月16日）：十月1日至16日。

28天的持续时间相同的开始一天，我想阅读从9月18日开始

有些人可以告诉我有什么办法可以做到这一点吗？

解决方案
你可以看看这个答案，你可以指定整个目录，使用通配符甚至目录和通配符的CSV 。例如：

$ $ p $ lt; code> sc.textFile（/ my / dir1，/ my / paths / part-00 [0-5] * ，/ another / dir，/ a / specific / file）

或者您可以使用 AWS API 来获取文件位置列表并使用spark读取这些文件。

您可以查看此答案到AWS S3文件搜索。

How do I generate and load multiple s3 file path in scala so that I can use :
sqlContext.read.json ("s3://..../*/*/*")
I know I can use wildcards to read multiple files but is there any way so that I can generate the path ? For example my fIle structure looks like this: BucketName/year/month/day/files
s3://testBucket/2016/10/16/part00000
These files are all jsons. The issue is I need to load just spacific duration of files, for eg. Say 16 days then I need to loado files for start day ( oct 16) : oct 1 to 16.

With 28 day duration for same start day I would like to read from Sep 18

Can some tell me any ways to do this ?
解决方案
You can take a look at this answer, You can specify whole directories, use wildcards and even CSV of directories and wildcards. E.g.:
sc.textFile("/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file")
Or you can use AWS API to get the list of files locations and read those files using spark .

You can look into this answer to AWS S3 file search.

这篇关于Spark：如何用scala生成从s3读取的文件路径的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark：如何用scala生成从s3读取的文件路径 [英] Spark :How to generate file path to read from s3 with scala

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark：如何用scala生成从s3读取的文件路径 [英] Spark :How to generate file path to read from s3 with scala

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭