导入/导出 DataFusion 管道 [英] Import/Export DataFusion pipelines

查看:19
本文介绍了导入/导出 DataFusion 管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有谁知道是否可以以编程方式导入/导出 DataFlow 管道(已部署或处于草稿状态)?

Does anyone know if it is possible to programmatically import/export DataFlow pipelines (deployed or in draft status)?

这个想法是编写一个脚本来删除和创建一个 DataFusion 实例,以避免在不使用时计费.通过 gloud 命令行,可以提供 DataFusion 集群并销毁它,但自动导出和导入我的所有管道也会很有趣.

The idea is to write a script to drop and create a DataFusion instance, in order to avoid billing when it's not used. Via gloud commandline it's possible to provision a DataFusion cluster and to destroy it, but it would be interesting to automatically export and import all my pipelines too.

不幸的是,官方文档并没有帮助我...

The official documentation, unfortunately, didn't help me...

谢谢!

推荐答案

您可以使用 REST API 来执行此操作.但是,您可能需要一些脚本来根据实例 url 自动执行此操作.您应该能够从应用程序列表 API (参考此处).在您的情况下,您首先需要获取管道列表(参考此处) 然后遍历所有管道并获取单个管道的详细信息,该管道将具有名为 configuration 的属性,该属性将具有配置管道 json.您仍然需要使用从后端收到的配置 json 来创建一个新的 JSON,其中包含名称、描述、工件信息以及配置属性.

You could use the REST API to do this. However you would probably need some script that automatically does this given the instance url. You should be able to get pipeline config from application list API (reference here). In your case you first need to get list of pipelines (reference here) then iterate through all pipelines and get details of individual pipeline which will have a property called configuration which will have the config pipeline json. You still have to create a new JSON with name, description, artifact information along with config property with configuration json you received from backend.

示例如下所示,

  1. 在您即将销毁的集群中,GET API 以artifactName=cdap-data-pipeline,cdap-data-streams 作为查询参数获取应用列表
  1. In your cluster you are about to destroy, GET API to get list of apps with artifactName=cdap-data-pipeline,cdap-data-streams as query parameter

/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams

  1. 解析响应并遍历单个应用程序并获取应用程序详细信息

namespaces/default/apps/<app-name>

对于每个应用程序,在响应中获取 configuration 属性,并将您的最终 JSON 形成为类似的内容,

For each app get configuration property in the response and form your final JSON to something like,


{   
  "name": "Pipeline_1",
  "description": "Pipeline to do taskX",
  "artifact": {
    "name": "cdap-data-pipeline",
    "version": "6.1.0-SNAPSHOT",
    "scope": "USER"
  },
  "config": JSON.parse(<configuration-from-app-detailed-api>) 
} 

  1. 然后在您将要创建的新集群中,只需使用您在上一步中获得的 json 部署管道即可.

需要注意的一点是,如果您在旧集群中为管道设置了计划或触发器,则不会在新集群中创建这些计划或触发器.如果您只是在部署和运行管道,则管道的其余部分应该可以正常工作.

One thing to note is, if you have setup schedules or triggers for pipelines in old cluster, those won't be created in the new cluster. Rest of the pipeline should just work if you are just deploying and running the pipeline.

希望这会有所帮助.

刚刚意识到有关于访问 REST API 进行数据融合的文档这里 但是,并不完全了解如何进行 REST api 调用.这是一个关于如何做的例子,

Just realized there is docs on accessing REST API for datafusion here However it doesn't take entirely about HOW to make the REST api call. Here is an example on how to do it,

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" -w"
" -X GET <instance-url>/namespaces/default/apps?artifactName=cdap-data-pipeline,cdap-data-streams?artifactName=cdap-data-pipeline,cdap-data-streams

这里我们使用 gcloud 来获取该特定实例的访问令牌.此操作的先决条件是使用 gcloud SDK 登录.一旦身份验证成功,这应该会成功返回特定实例中的应用列表.

Here we use gcloud to get access-token to that specific instance. A pre-requisite for this would be to signin with gcloud SDK. This should successfully return the list of apps in your specific instance once the authentication is successful.

这篇关于导入/导出 DataFusion 管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆