Google Cloud Dataflow:通过命令行参数指定 TempLocation [英] Google Cloud Dataflow: Specifying TempLocation via Command Line Argument

查看:23
本文介绍了Google Cloud Dataflow:通过命令行参数指定 TempLocation的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过在命令行中将其作为选项传递来指定我的 GCS 临时位置,如下所示.

I am attempting to specify my GCS temp location by passing it as an option in the command-line as shown below.

java -jar pipeline-0.0.1-SNAPSHOT.jar --runner=DataflowRunner --project=<my_project> --tempLocation=gs://<my_bucket>/<my_folder>

但是,我继续收到语法错误:

However, I continue to receive a syntax error:

java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://<my_bucket>/<my_folder>

我指的是以下文档:

https://cloud.google.com/dataflow/pipelines/specifying-exec-params

我指定我从命令行获取参数,如下所示:

I specify that I am taking the argument from the command-line as such:

DataflowPipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(DataflowPipelineOptions.class);

使用以下问题中的完整堆栈跟踪进行更新:

Updated with the full stack trace as asked in questions below:

org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://pipeline-az/staging
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342)
        at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:312)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:206)
        at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:62)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
        at com.autozone.google.pipeline.PipelinePeople.main(PipelinePeople.java:97)
Caused by: java.nio.file.InvalidPathException: Illegal char <:> at index 2: gs://pipeline-az/staging
        at sun.nio.fs.WindowsPathParser.normalize(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPathParser.parse(Unknown Source)
        at sun.nio.fs.WindowsPath.parse(Unknown Source)
        at sun.nio.fs.WindowsFileSystem.getPath(Unknown Source)
        at java.nio.file.Paths.get(Unknown Source)
        at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:196)
        at org.apache.beam.sdk.io.LocalFileSystem.matchNewResource(LocalFileSystem.java:78)
        at org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:544)
        at org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:325)
        at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$4.getTempFilePrefix(BatchLoads.java:381)

我的 Pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.hendpro.google</groupId>
  <artifactId>pipeline</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>pipeline</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

<build>
  <plugins>
    <plugin>
      <!-- Build an executable JAR -->
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-jar-plugin</artifactId>
      <version>3.0.2</version>
      <configuration>
        <archive>
          <manifest>
            <addClasspath>true</addClasspath>
            <classpathPrefix>lib/</classpathPrefix>
            <mainClass>com.hendpro.google.pipeline.PipelinePeople</mainClass>
          </manifest>
        </archive>
      </configuration>
    </plugin>
    <plugin>
      <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
    </plugin>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
</build>

  <dependencies>
  <dependency>
        <groupId>log4j</groupId>
        <artifactId>log4j</artifactId>
        <version>1.2.17</version>
    </dependency>
   <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-core</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-io-jdbc</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.beam</groupId>
        <artifactId>beam-runners-direct-java</artifactId>
        <version>2.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>42.1.1</version>
    </dependency>
  </dependencies>
</project>

更新:

我也尝试过直接运行器和数据流运行器,并尝试使用和不使用以下内容:

I've also tried both direct runner as well as the dataflow runner and have tried with and without the following:

.as(DataflowPipelineOptions.class);
.as(DirectOptions.class);

无论跑步者选择或声明如何,错误仍然存​​在.

Regardless of runner choice or declaration the error persists.

添加阴影罐列表:

[INFO] --- maven-shade-plugin:3.1.0:shade (default) @ pipeline ---
[INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-core:jar:2.3.0 in the shaded jar.
[INFO] Including com.google.code.findbugs:jsr305:jar:3.0.1 in the shaded jar.
[INFO] Including com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-core:jar:2.8.9 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-annotations:jar:2.8.9 in the shaded jar.
[INFO] Including com.fasterxml.jackson.core:jackson-databind:jar:2.8.9 in the shaded jar.
[INFO] Including org.slf4j:slf4j-api:jar:1.7.25 in the shaded jar.
[INFO] Including org.apache.avro:avro:jar:1.8.2 in the shaded jar.
[INFO] Including org.codehaus.jackson:jackson-core-asl:jar:1.9.13 in the shaded jar.
[INFO] Including org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13 in the shaded jar.
[INFO] Including com.thoughtworks.paranamer:paranamer:jar:2.7 in the shaded jar.
[INFO] Including org.apache.commons:commons-compress:jar:1.8.1 in the shaded jar.
[INFO] Including org.tukaani:xz:jar:1.5 in the shaded jar.
[INFO] Including org.xerial.snappy:snappy-java:jar:1.1.4 in the shaded jar.
[INFO] Including joda-time:joda-time:jar:2.4 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-io-jdbc:jar:2.3.0 in the shaded jar.
[INFO] Including org.apache.commons:commons-dbcp2:jar:2.1.1 in the shaded jar.
[INFO] Including org.apache.commons:commons-pool2:jar:2.4.2 in the shaded jar.
[INFO] Including commons-logging:commons-logging:jar:1.2 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-io-google-cloud-platform:jar:2.3.0 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:2.3.0 in the shaded jar.
[INFO] Including com.google.cloud.bigdataoss:gcsio:jar:1.4.5 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-cloudresourcemanager:jar:v1-rev6-1.22.0 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-storage:jar:v1-rev71-1.22.0 in the shaded jar.
[INFO] Including org.apache.beam:beam-sdks-java-extensions-protobuf:jar:2.3.0 in the shaded jar.
[INFO] Including io.grpc:grpc-core:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.errorprone:error_prone_annotations:jar:2.0.11 in the shaded jar.
[INFO] Including io.grpc:grpc-context:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.instrumentation:instrumentation-api:jar:0.3.0 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-bigquery:jar:v2-rev355-1.22.0 in the shaded jar.
[INFO] Including com.google.api:gax-grpc:jar:0.20.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.api:api-common:jar:1.1.0 in the shaded jar.
[INFO] Including com.google.auto.value:auto-value:jar:1.2 in the shaded jar.
[INFO] Including com.google.api:gax:jar:1.3.1 in the shaded jar.
[INFO] Including org.threeten:threetenbp:jar:1.3.3 in the shaded jar.
[INFO] Including com.google.cloud:google-cloud-core-grpc:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java-util:jar:3.2.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.7 in the shaded jar.
[INFO] Including com.google.apis:google-api-services-pubsub:jar:v1-rev10-1.22.0 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-pubsub-v1:jar:0.1.18 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-pubsub-v1:jar:0.1.18 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-iam-v1:jar:0.1.18 in the shaded jar.
[INFO] Including com.google.cloud.bigdataoss:util:jar:1.4.5 in the shaded jar.
[INFO] Including com.google.api-client:google-api-client-java6:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.api-client:google-api-client-jackson2:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.oauth-client:google-oauth-client:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.oauth-client:google-oauth-client-java6:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.cloud.datastore:datastore-v1-proto-client:jar:1.4.0 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client-protobuf:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client-jackson:jar:1.20.0 in the shaded jar.
[INFO] Including com.google.cloud.datastore:datastore-v1-protos:jar:1.3.0 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-common-protos:jar:0.1.0 in the shaded jar.
[INFO] Including io.grpc:grpc-auth:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-netty:jar:1.2.0 in the shaded jar.
[INFO] Including io.netty:netty-codec-http2:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-http:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-handler-proxy:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-codec-socks:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-handler:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-buffer:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-common:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-transport:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-resolver:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.netty:netty-codec:jar:4.1.8.Final in the shaded jar.
[INFO] Including io.grpc:grpc-stub:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-all:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-okhttp:jar:1.2.0 in the shaded jar.
[INFO] Including com.squareup.okhttp:okhttp:jar:2.5.0 in the shaded jar.
[INFO] Including com.squareup.okio:okio:jar:1.6.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-lite:jar:1.2.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-nano:jar:1.2.0 in the shaded jar.
[INFO] Including com.google.protobuf.nano:protobuf-javanano:jar:3.0.0-alpha-5 in the shaded jar.
[INFO] Including com.google.cloud:google-cloud-core:jar:1.0.2 in the shaded jar.
[INFO] Including org.json:json:jar:20160810 in the shaded jar.
[INFO] Including com.google.cloud:google-cloud-spanner:jar:0.20.0-beta in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-spanner-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-spanner-admin-instance-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-spanner-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-spanner-admin-database-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-cloud-spanner-admin-instance-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:grpc-google-longrunning-v1:jar:0.1.11 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-longrunning-v1:jar:0.1.11 in the shaded jar.
[INFO] Including junit:junit:jar:4.12 in the shaded jar.
[INFO] Including org.hamcrest:hamcrest-core:jar:1.3 in the shaded jar.
[INFO] Including com.google.cloud.bigtable:bigtable-protos:jar:1.0.0-pre3 in the shaded jar.
[INFO] Including com.google.cloud.bigtable:bigtable-client-core:jar:1.0.0 in the shaded jar.
[INFO] Including com.google.auth:google-auth-library-appengine:jar:0.7.0 in the shaded jar.
[INFO] Including io.opencensus:opencensus-contrib-grpc-util:jar:0.7.0 in the shaded jar.
[INFO] Including io.opencensus:opencensus-api:jar:0.7.0 in the shaded jar.
[INFO] Including io.dropwizard.metrics:metrics-core:jar:3.1.2 in the shaded jar.
[INFO] Including com.google.api-client:google-api-client:jar:1.22.0 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client:jar:1.22.0 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpclient:jar:4.0.1 in the shaded jar.
[INFO] Including org.apache.httpcomponents:httpcore:jar:4.0.1 in the shaded jar.
[INFO] Including commons-codec:commons-codec:jar:1.3 in the shaded jar.
[INFO] Including com.google.http-client:google-http-client-jackson2:jar:1.22.0 in the shaded jar.
[INFO] Including com.google.auth:google-auth-library-credentials:jar:0.7.1 in the shaded jar.
[INFO] Including com.google.auth:google-auth-library-oauth2-http:jar:0.7.1 in the shaded jar.
[INFO] Including com.google.guava:guava:jar:20.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:3.2.0 in the shaded jar.
[INFO] Including io.netty:netty-tcnative-boringssl-static:jar:1.1.33.Fork26 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1:jar:0.1.9 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-common-protos:jar:0.1.9 in the shaded jar.
[INFO] Including org.apache.beam:beam-runners-direct-java:jar:2.3.0 in the shaded jar.
[INFO] Including org.apache.beam:beam-runners-local-java-core:jar:2.3.0 in the shaded jar.
[INFO] Including org.postgresql:postgresql:jar:42.1.1 in the shaded jar.
[WARNING] grpc-google-common-protos-0.1.0.jar, proto-google-common-protos-0.1.9.jar, proto-google-longrunning-v1-0.1.11.jar define 28 overlapping classes:
[WARNING]   - com.google.longrunning.ListOperationsRequestOrBuilder
[WARNING]   - com.google.longrunning.ListOperationsRequest$Builder
[WARNING]   - com.google.longrunning.OperationsProto$1
[WARNING]   - com.google.longrunning.OperationOrBuilder
[WARNING]   - com.google.longrunning.ListOperationsResponseOrBuilder
[WARNING]   - com.google.longrunning.DeleteOperationRequestOrBuilder
[WARNING]   - com.google.longrunning.DeleteOperationRequest$1
[WARNING]   - com.google.longrunning.CancelOperationRequest$1
[WARNING]   - com.google.longrunning.GetOperationRequest
[WARNING]   - com.google.longrunning.Operation$2
[WARNING]   - 18 more...
[WARNING] grpc-google-common-protos-0.1.0.jar, proto-google-common-protos-0.1.9.jar define 352 overlapping classes:
[WARNING]   - com.google.api.Logging
[WARNING]   - com.google.api.Usage$1
[WARNING]   - com.google.rpc.ResourceInfoOrBuilder
[WARNING]   - com.google.api.AuthProvider$1
[WARNING]   - com.google.api.ProjectProperties$Builder
[WARNING]   - com.google.api.DocumentationProto
[WARNING]   - com.google.type.TimeOfDayOrBuilder
[WARNING]   - com.google.api.MonitoringOrBuilder
[WARNING]   - com.google.api.Authentication$Builder
[WARNING]   - com.google.api.Monitoring
[WARNING]   - 342 more...
[WARNING] beam-sdks-java-core-2.3.0.jar, beam-sdks-java-extensions-google-cloud-platform-core-2.3.0.jar define 3 overlapping classes:
[WARNING]   - org.apache.beam.sdk.util.AutoValue_DoFnAndMainOutput
[WARNING]   - org.apache.beam.sdk.util.package-info
[WARNING]   - org.apache.beam.sdk.util.AutoValue_ReleaseInfo
[WARNING] grpc-google-common-protos-0.1.0.jar, grpc-google-longrunning-v1-0.1.11.jar define 7 overlapping classes:
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsStub
[WARNING]   - com.google.longrunning.OperationsGrpc$1
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsFutureStub
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsImplBase
[WARNING]   - com.google.longrunning.OperationsGrpc$OperationsBlockingStub
[WARNING]   - com.google.longrunning.OperationsGrpc
[WARNING]   - com.google.longrunning.OperationsGrpc$MethodHandlers
[WARNING] maven-shade-plugin has detected that some class files are
[WARNING] present in two or more JARs. When this happens, only one
[WARNING] single version of the class is copied to the uber jar.
[WARNING] Usually this is not harmful and you can skip these warnings,
[WARNING] otherwise try to manually exclude artifacts based on
[WARNING] mvn dependency:tree -Ddetail=true and the above output.
[WARNING] See http://maven.apache.org/plugins/maven-shade-plugin/

推荐答案

如本answer中所述,当您使用Maven Shade Plugin 结合 ServiceLoader 进行依赖注入,你应该指定 ServicesResourceTransformer 在您的 pom.xml 文件中:

As explained in this answer, when you use the Maven Shade Plugin in conjunction with ServiceLoader for dependency injection, you should specify ServicesResourceTransformer in your pom.xml file:

<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>

即使插件是重定位类,这将确保您的依赖项的 META-INF/services 下的每个服务文件都被合并,而无需全部声明.

Even if the plugin is relocating classes, this will ensure that every service file under META-INF/services of your dependencies is merged, without the need to declare them all.

注意:暂时将此作为社区维基答案发布,但如果@jkff 发布他的 评论 作为答案.全部归功于@Tunaki 和@jkff.

Note: just posting this as a community wiki answer for now but I'll gladly delete it if @jkff posts his comment as an answer instead. All credit to @Tunaki and @jkff.

这篇关于Google Cloud Dataflow:通过命令行参数指定 TempLocation的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆