如何使用Google Cloud Dataflow登台其他文件? [英] How can I stage additional files using Google Cloud Dataflow?
问题描述
我正在读取Google Dataflow程序中的一堆配置文件,不知道什么是最好的暂存配置文件.目前,我是通过这种方式执行的,系统无法找到它们.
I am reading a bunch configuration files in my Google Dataflow program and wonder what is the best way to stage them. Currently I do it this way and the system cannot find them.
FileReader filereader1 = new FileReader("config_1.csv");
FileReader filereader2 = new FileReader("config_2.csv");
config_1.csv
和 config_2.csv
存储在 ./target/classes/org/model/examples/
我正在运行的脚本如下:
My running script looks like this:
mvn compile exec:java -Dexec.mainClass=org.model.examples.MyPipeline \
-Dexec.args="--runner=DataflowRunner \
--project=mortgage-data-warehouse
--gcpTempLocation=gs://my-project-bucket/tmp \
--inputFile=gs://my-project-bucket/Data/input.txt \
--filesToStage=./target/classes/org/datamodel/examples/config_1.csv, ./target/classes/org/datamodel/examples/config_2.csv" \
-Pdataflow-runner
我遇到了错误
java.io.FileNotFoundException:config_1.csv(系统找不到指定的文件)
java.io.FileNotFoundException: config_1.csv (The system cannot find the file specified)
我想知道这是否是设置-filesToStage
的正确方法.
I wonder if this is the proper way to set --filesToStage
.
推荐答案
For small configuration files, it is better to read files from resource folder such as what has been written by this link and avoid the complication of using --filesToStage
这篇关于如何使用Google Cloud Dataflow登台其他文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!