启动集群时如何指定主节点的磁盘空间(卷大小)? [英] How do I specify disk space (volume size) of a master node when spinning up a cluster?

查看:17
本文介绍了启动集群时如何指定主节点的磁盘空间(卷大小)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

本文档显示了基于实例大小的默认卷大小:

解决方案

您可以指定 VolumeSpecification JSON 来完成此操作.我还没有为主节点尝试过这个.核心节点和任务节点我都用过,但我相信这个概念也可以扩展到主节点.

VolumeSpecification JSON 中的字段是不言自明的,所以我不在这里添加它们的解释.您可以在此处阅读它们 VolumeSpecification说明

我正在添加一个代码片段,可以帮助您准确地使用此配置.我在我的代码中使用了标准的 boto3 库.我有一个生成 EMR 集群的 lambda 函数,但是拥有一个生成 EMR 的 lambda 函数不是必须,您可以选择自己的替代方案.代码片段是:

from datetime import datetime导入 boto3'''此代码段用于创建 EMR 集群.'''def create_emr_cluster(事件,上下文):conn = boto3.client(emr")今天 = datetime.today().strftime('%Y-%m-%d')cluster_id = conn.run_job_flow(名称='您的_EMR_名称',ServiceRole='EMR_DefaultRole',JobFlowRole='EMR_EC2_DefaultRole',VisibleToAllUsers=True,LogUri='s3://your-s3-path-where-you-want-cluster-logs/%s/' % 今天,ReleaseLabel='emr-5.17.0',ScaleDownBehavior='TERMINATE_AT_TASK_COMPLETION',Applications=[{'Name':'Spark'},{'名称':'Hadoop'},{'名称':'蜂巢'},{'名称':'色调'}]实例={'KeepJobFlowAliveWhenNoSteps':错误,'Ec2KeyName': '您的密钥名称-这里','Ec2SubnetId': '您的子网 ID','InstanceFleet':[{'Name': '主节点','InstanceFleetType': 'MASTER','TargetOnDemandCapacity': 1,'InstanceTypeConfigs':[{'InstanceType': 'c4.xlarge'}]}, {'名称': '核心节点','InstanceFleetType': '核心','TargetOnDemandCapacity': 1,'InstanceTypeConfigs':[{'InstanceType': 'r5.2xlarge',EbsConfiguration":{EbsBlockDeviceConfigs":[{体积规格":{SizeInGB":64,卷类型":gp2"},VolumesPerInstance":1}]}}]}, {'Name': '任务节点','InstanceFleetType': 'TASK','TargetSpotCapacity': 100,'InstanceTypeConfigs':[{'InstanceType': 'r5.2xlarge','BidPriceAsPercentageOfOnDemandPrice':50,'加权容量':16,EbsConfiguration":{EbsBlockDeviceConfigs":[{体积规格":{SizeInGB":32,卷类型":gp2"},VolumesPerInstance":1}]}}, {'InstanceType': 'r5.4xlarge','BidPriceAsPercentageOfOnDemandPrice':50,'加权容量':40,EbsConfiguration":{EbsBlockDeviceConfigs":[{体积规格":{SizeInGB":64,卷类型":gp2"},VolumesPerInstance":1}]}}]}]})返回 cluster_id['JobFlowId']

This documentation shows the default volume sizes based on the instance size: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html

My question is how do I specify the volume size to be bigger when starting up the cluster.

Currently, I'm manually changing it from the EMR page after the cluster is up and running:

解决方案

You can specify the VolumeSpecification JSON to get this done. I have not tried this for the master node. I had used it for the core node and task node, But I believe this concept can be extended to the master node as well.

The fields inside the VolumeSpecification JSON are self-explanatory, So I am not adding their explanation here. You can read them here VolumeSpecification explanation

I am adding a code snippet that can help you how do we exactly use this configuration. I am using the standard boto3 library in my code. I have a lambda function that spawns the EMR cluster, but having a lambda function to spawn EMR is, not a must, and you can choose your own alternative. The code snippet is:

from datetime import datetime
import boto3
'''
    This code snippet is used to create an EMR cluster.
'''

def create_emr_cluster(event, context):
    conn = boto3.client("emr")
    today = datetime.today().strftime('%Y-%m-%d')
    cluster_id = conn.run_job_flow(
        Name='Your_EMR_name',
        ServiceRole='EMR_DefaultRole',
        JobFlowRole='EMR_EC2_DefaultRole',
        VisibleToAllUsers=True,
        LogUri='s3://your-s3-path-where-you-want-cluster-logs/%s/' % today,
        ReleaseLabel='emr-5.17.0',
        ScaleDownBehavior='TERMINATE_AT_TASK_COMPLETION',
        Applications=[{'Name': 'Spark'},
                      {'Name': 'Hadoop'},
                      {'Name': 'Hive'},
                      {'Name': 'Hue'}]
        Instances={
            'KeepJobFlowAliveWhenNoSteps': False,
            'Ec2KeyName': 'your-key-name-here',
            'Ec2SubnetId': 'your-subnet-id',
            'InstanceFleets': [
                {'Name': 'Master Node',
                 'InstanceFleetType': 'MASTER',
                 'TargetOnDemandCapacity': 1,
                 'InstanceTypeConfigs': [{
                     'InstanceType': 'c4.xlarge'
                 }]
                 }, {
                    'Name': 'Core Node',
                    'InstanceFleetType': 'CORE',
                    'TargetOnDemandCapacity': 1,
                    'InstanceTypeConfigs': [{
                        'InstanceType': 'r5.2xlarge',
                        "EbsConfiguration": {
                            "EbsBlockDeviceConfigs": [
                                {
                                    "VolumeSpecification": {
                                        "SizeInGB": 64,
                                        "VolumeType": "gp2"
                                    },
                                    "VolumesPerInstance": 1
                                }
                            ]
                        }
                    }]
                }, {
                    'Name': 'Task Nodes',
                    'InstanceFleetType': 'TASK',
                    'TargetSpotCapacity': 100,
                    'InstanceTypeConfigs': [{
                        'InstanceType': 'r5.2xlarge',
                        'BidPriceAsPercentageOfOnDemandPrice': 50,
                        'WeightedCapacity': 16,
                        "EbsConfiguration": {
                            "EbsBlockDeviceConfigs": [
                                {
                                    "VolumeSpecification": {
                                        "SizeInGB": 32,
                                        "VolumeType": "gp2"
                                    },
                                    "VolumesPerInstance": 1
                                }
                            ]
                        }
                    }, {
                        'InstanceType': 'r5.4xlarge',
                        'BidPriceAsPercentageOfOnDemandPrice': 50,
                        'WeightedCapacity': 40,
                        "EbsConfiguration": {
                            "EbsBlockDeviceConfigs": [
                                {
                                    "VolumeSpecification": {
                                        "SizeInGB": 64,
                                        "VolumeType": "gp2"
                                    },
                                    "VolumesPerInstance": 1
                                }
                            ]
                        }
                    }]

                }]
        }
    )
    return cluster_id['JobFlowId']

这篇关于启动集群时如何指定主节点的磁盘空间(卷大小)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆