文本IO.使用模式 {} 从 GCS 读取多个文件 [英] TextIO. Read multiple files from GCS using pattern {}

查看:20
本文介绍了文本IO.使用模式 {} 从 GCS 读取多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用以下内容

TextIO.Read.from("gs://xyz.abc/xxx_{2017-06-06,2017-06-06}.csv")

据我所知,这种模式不起作用

That pattern didn't work, as I get

java.lang.IllegalStateException: Unable to find any files matching StaticValueProvider{value=gs://xyz.abc/xxx_{2017-06-06,2017-06-06}.csv}

即使这两个文件确实存在.我尝试使用类似表达式的本地文件

Even though those 2 files do exist. And I tried with a local file using a similar expression

TextIO.Read.from("somefolder/xxx_{2017-06-06,2017-06-06}.csv")

这确实工作得很好.

我原以为 GCS 中的文件会支持各种类型的 glob,但不是.这是为什么?有没有办法完成我正在寻找的东西?

I would've thought there would be support for all kinds of globs for files in GCS, but nope. Why is that? is there away to accomplish what I'm looking for?

推荐答案

这可能是另一种选择,除了 Scott 的建议和您对他的回答的评论:

This may be another option, in addition to Scott's suggestion and your comment on his answer:

您可以定义一个包含要读取的路径的列表,然后对其进行迭代,以通常的方式创建多个 PCollection:

You can define a list with the paths you want to read and then iterate over it, creating a number of PCollections in the usual way:

PCollection<String> events1 = p.apply(TextIO.Read.from(path1));
PCollection<String> events2 = p.apply(TextIO.Read.from(path2));

然后创建一个PCollectionList:

PCollectionList<String> eventsList = PCollectionList.of(events1).and(events2);

然后将该列表展平到您的 PCollection 中作为您的主要输入:

And then flatten this list into your PCollection for your main input:

PCollectionevents = eventsList.apply(Flatten.pCollections());

这篇关于文本IO.使用模式 {} 从 GCS 读取多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆