DataflowRunner退出并显示“未找到要暂存的文件". [英] DataflowRunner exits with "No files to stage has been found."

查看:112
本文介绍了DataflowRunner退出并显示“未找到要暂存的文件".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从 https://beam运行WordCount java示例.apache.org/get-started/quickstart-java/,但是以某种方式我得到一个错误,即ClasspathScanningResourcesDetector找不到要暂存的文件.我完全按照网站上的描述运行示例:

I want to run the WordCount java example from https://beam.apache.org/get-started/quickstart-java/, but somehow I get an error that no files to stage have been found by the ClasspathScanningResourcesDetector. I run the example exactly as described on the website:

 mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=DataflowRunner --project=<your-gcp-project> \
                  --gcpTempLocation=gs://<your-gcs-bucket>/tmp \
                  --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://<your-gcs-bucket>/counts" \
     -Pdataflow-runner

,产生

Caused by: java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
    ... 5 more
Caused by: java.lang.IllegalArgumentException: No files to stage has been found.
    at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:281)
    ... 10 more

我正在使用最新的Beam版本

I am using the latest beam version

<beam.version>2.19.0</beam.version>

您知道如何解决此问题吗?

Do you know how to fix this?

这是2.19.0中的错误.在2.18.0中可以使用

This is a bug in 2.19.0. It works in 2.18.0

我在Windows上使用Redhat OpenJDK 8

I am using Redhat OpenJDK 8 on Windows

另外,从标准单词计数示例来看,某些单元测试失败了

Also, some unit test are failing from the standard wordcount example

DebuggingWordCountTest失败

DebuggingWordCountTest fails with

org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.io.FileNotFoundException: No files matched spec: /Users/<redacted>/AppData/Local/Temp/junit7907687962995108435/junit2682353785908929665.tmp

    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:301)

推荐答案

  • 运行数据流时,它将尝试查找并上传 依赖关系.
  • 我认为您遇到错误没有可暂存的文件 已找到"".由于某些类路径问题.
  • 尝试使用--filesToStage 手动提供要执行的jar或类的选项
    • When you are running the dataflow it will try to find and upload the dependencies.
    • I assume you are getting error "No files to stage has been found" due some classpath issue.
    • Try to use the --filesToStage option to manually provide the jars or classes to stage
    • 还提供了示例日志,该示例日志已成功复制了114个文件到舞台上,因此您可以将其与完整日志进行比较以了解问题所在.

      Also Provided sample logs which successfully copied 114 files to stage so you can compare with your complete logs to understand the issue.

      Mar 08, 2020 7:37:41 PM org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory create
      INFO: No stagingLocation provided, falling back to gcpTempLocation
      Mar 08, 2020 7:37:42 PM org.apache.beam.runners.dataflow.DataflowRunner fromOptions
      INFO: PipelineOptions.filesToStage was not specified. Defaulting to files from the classpath: will stage 114 files. Enable logging at DEBUG level to see which files will be staged.
      Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.DataflowRunner run
      INFO: Executing pipeline on the Dataflow Service, which will have billing implications related to Google Compute Engine usage and other Google Cloud Services.
      Mar 08, 2020 7:37:43 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
      INFO: Uploading 114 files from PipelineOptions.filesToStage to staging location to prepare for execution.
      Mar 08, 2020 7:37:48 PM org.apache.beam.runners.dataflow.util.PackageUtil stageClasspathElements
      INFO: Staging files complete: 114 files cached, 0 files newly uploaded
      

      您可以尝试使用以下命令来生成所需的源代码,并重新运行管道以暂存依赖项.

      You can try the below commands to generate the source code required and run the pipeline freshly to stage dependencies.

      mvn archetype:generate \
            -DarchetypeGroupId=org.apache.beam \
            -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
            -DarchetypeVersion=2.8.0 \
            -DgroupId=org.example \
            -DartifactId=first-dataflow \
            -Dversion="0.1" \
            -Dpackage=org.apache.beam.examples \
            -DinteractiveMode=false
      

      您也可以在qwiklabs中免费试用: https://google.qwiklabs.com/focuses/7974?parent=catalog

      Also you can try it in qwiklabs for free: https://google.qwiklabs.com/focuses/7974?parent=catalog

      这篇关于DataflowRunner退出并显示“未找到要暂存的文件".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆