如何在 Kubernetes 容器/Pod 上挂载 S3 存储桶? [英] How to mount S3 bucket on Kubernetes container/pods?

查看:158
本文介绍了如何在 Kubernetes 容器/Pod 上挂载 S3 存储桶?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 Amazon EKS 集群上运行我的 Spark 作业.我的 spark 作业需要在每个数据节点/worker/executor 上提供一些静态数据(参考数据),而这些参考数据在 S3 中可用.

I am trying to run my spark job on Amazon EKS cluster. My spark job required some static data (reference data) at each data nodes/worker/executor and this reference data is available at S3.

有人可以帮我找到一个干净且高性能的解决方案来在 pod 上安装 S3 存储桶吗?

Can somebody kindly help me to find out a clean and performant solution to mount S3 bucket on pods ?

S3 API 是一个选项,我将它用于我的输入记录和输出结果.但是参考数据"是静态数据,所以我不想在每次运行/执行我的火花作业时下载它.在第一次运行时,作业将下载数据,接下来的作业将检查数据是否已在本地可用,无需再次下载.

S3 API is an option and I am using it for my input records and output results. But "Reference data" is static data so I dont want to download it in each run/execution of my spark job. In first run job will download the data and upcoming jobs will check if data is already available locally and there is no need to download it again.

推荐答案

我们最近开源了一个项目,旨在为您自动化这些步骤:https://github.com/IBM/dataset-lifecycle-framework

We recently opensourced a project that looks to automate this steps for you: https://github.com/IBM/dataset-lifecycle-framework

基本上你可以创建一个数据集:

Basically you can create a dataset:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  name: example-dataset
spec:
  local:
    type: "COS"
    accessKeyID: "iQkv3FABR0eywcEeyJAQ"
    secretAccessKey: "MIK3FPER+YQgb2ug26osxP/c8htr/05TVNJYuwmy"
    endpoint: "http://192.168.39.245:31772"
    bucket: "my-bucket-d4078283-dc35-4f12-a1a3-6f32571b0d62"
    region: "" #it can be empty

然后你会得到一个 pvc,你可以安装在你的 pods 中

And then you will get a pvc you can mount in your pods

这篇关于如何在 Kubernetes 容器/Pod 上挂载 S3 存储桶?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆