导入/导出DataFusion管道 [英] Import/Export DataFusion pipelines

查看:85
本文介绍了导入/导出DataFusion管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人知道是否可以通过编程方式导入/导出DataFlow管道(已部署或处于草稿状态)?

Does anyone know if it is possible to programmatically import/export DataFlow pipelines (deployed or in draft status)?

该想法是编写一个脚本来删除和创建一个DataFusion实例,以免在不使用时进行计费。
可以通过gloud命令行配置DataFusion集群并销毁它,但是自动导出和导入我所有的管道也很有趣。

The idea is to write a script to drop and create a DataFusion instance, in order to avoid billing when it's not used. Via gloud commandline it's possible to provision a DataFusion cluster and to destroy it, but it would be interesting to automatically export and import all my pipelines too.

The不幸的是,官方文档并没有帮助我...

The official documentation, unfortunately, didn't help me...

谢谢!

推荐答案

您可以使用REST API来执行此操作。但是,在给定实例URL的情况下,您可能需要一些脚本来自动执行此操作。您应该能够从应用程序列表API(参考)。在您的情况下,您首先需要获取管道列表(此处引用),然后遍历所有管道并获取各个管道的详细信息,这些管道将具有称为 configuration 的属性具有配置管道json。您仍然必须使用名称,描述,工件信息以及带有从后端收到的配置json的config属性创建新的JSON。

You could use the REST API to do this. However you would probably need some script that automatically does this given the instance url. You should be able to get pipeline config from application list API (reference here). In your case you first need to get list of pipelines (reference here) then iterate through all pipelines and get details of individual pipeline which will have a property called configuration which will have the config pipeline json. You still have to create a new JSON with name, description, artifact information along with config property with configuration json you received from backend.

示例如下所示,


  1. 在您要破坏的群集中,获取API以获得具有 artifactName = cdap-data-管道,cdap数据流作为查询参数

  1. In your cluster you are about to destroy, GET API to get list of apps with artifactName=cdap-data-pipeline,cdap-data-streams as query parameter



/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams




  1. 解析响应并遍历各个应用程序和GET应用程序详细信息



namespaces/default/apps/<app-name>

对于每个应用,在配置属性中响应并形成最终的JSON,例如

For each app get configuration property in the response and form your final JSON to something like,


{   
  "name": "Pipeline_1",
  "description": "Pipeline to do taskX",
  "artifact": {
    "name": "cdap-data-pipeline",
    "version": "6.1.0-SNAPSHOT",
    "scope": "USER"
  },
  "config": JSON.parse(<configuration-from-app-detailed-api>) 
} 




  1. 然后在新集群中,您将要创建的只是使用上一步中获得的json部署管道。

需要注意的一件事就是说,如果您在旧集群中具有安装计划或管道的触发器,则不会在新集群中创建这些计划或触发器。如果只是部署和运行管道,其余的管道也应该可以正常工作。

One thing to note is, if you have setup schedules or triggers for pipelines in old cluster, those won't be created in the new cluster. Rest of the pipeline should just work if you are just deploying and running the pipeline.

希望这会有所帮助。

刚刚意识到有用于访问REST API进行数据融合的文档但是,并不需要花很多时间来进行REST api调用。这是一个示例,

Just realized there is docs on accessing REST API for datafusion here However it doesn't take entirely about HOW to make the REST api call. Here is an example on how to do it,

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -w"\n" -X GET <instance-url>/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams

在这里,我们使用gcloud获取对该特定实例的访问令牌。前提条件是使用gcloud SDK登录。身份验证成功后,这将成功返回您特定实例中的应用程序列表。

Here we use gcloud to get access-token to that specific instance. A pre-requisite for this would be to signin with gcloud SDK. This should successfully return the list of apps in your specific instance once the authentication is successful.

这篇关于导入/导出DataFusion管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆