TextIO.使用模式{}从GCS读取多个文件 [英] TextIO. Read multiple files from GCS using pattern {}

查看:90
本文介绍了TextIO.使用模式{}从GCS读取多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用以下内容

TextIO.Read.from("gs://xyz.abc/xxx_{2017-06-06,2017-06-06}.csv")

按照我的说法,这种模式不起作用

That pattern didn't work, as I get

java.lang.IllegalStateException: Unable to find any files matching StaticValueProvider{value=gs://xyz.abc/xxx_{2017-06-06,2017-06-06}.csv}

即使那两个文件确实存在.我尝试使用类似的表达式处理本地文件

Even though those 2 files do exist. And I tried with a local file using a similar expression

TextIO.Read.from("somefolder/xxx_{2017-06-06,2017-06-06}.csv")

那确实很好.

我本以为GCS中的文件会支持各种glob,但不会.这是为什么?有没有完成我想要的东西的地方?

I would've thought there would be support for all kinds of globs for files in GCS, but nope. Why is that? is there away to accomplish what I'm looking for?

推荐答案

除了Scott的建议和您对他的回答的评论之外,这可能是另一个选择:

This may be another option, in addition to Scott's suggestion and your comment on his answer:

您可以定义一个列表,其中包含要读取的路径,然后在其上进行迭代,以通常的方式创建许多PCollection:

You can define a list with the paths you want to read and then iterate over it, creating a number of PCollections in the usual way:

PCollection<String> events1 = p.apply(TextIO.Read.from(path1));
PCollection<String> events2 = p.apply(TextIO.Read.from(path2));

然后创建一个 PCollectionList :

PCollectionList<String> eventsList = PCollectionList.of(events1).and(events2);

然后将该列表展平到您的PCollection中作为主要输入:

And then flatten this list into your PCollection for your main input:

PCollection<String> events = eventsList.apply(Flatten.pCollections());

这篇关于TextIO.使用模式{}从GCS读取多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆