启动集群时如何指定主节点的磁盘空间(卷大小)? [英] How do I specify disk space (volume size) of a master node when spinning up a cluster?
问题描述
本文档显示了基于实例大小的默认卷大小:
您可以指定 VolumeSpecification
JSON 来完成此操作.我还没有为主节点尝试过这个.核心节点和任务节点我都用过,但我相信这个概念也可以扩展到主节点.
VolumeSpecification
JSON 中的字段是不言自明的,所以我不在这里添加它们的解释.您可以在此处阅读它们 VolumeSpecification说明
我正在添加一个代码片段,可以帮助您准确地使用此配置.我在我的代码中使用了标准的 boto3 库
.我有一个生成 EMR 集群的 lambda 函数,但是拥有一个生成 EMR 的 lambda 函数不是必须,您可以选择自己的替代方案.代码片段是:
from datetime import datetime导入 boto3'''此代码段用于创建 EMR 集群.'''def create_emr_cluster(事件,上下文):conn = boto3.client(emr")今天 = datetime.today().strftime('%Y-%m-%d')cluster_id = conn.run_job_flow(名称='您的_EMR_名称',ServiceRole='EMR_DefaultRole',JobFlowRole='EMR_EC2_DefaultRole',VisibleToAllUsers=True,LogUri='s3://your-s3-path-where-you-want-cluster-logs/%s/' % 今天,ReleaseLabel='emr-5.17.0',ScaleDownBehavior='TERMINATE_AT_TASK_COMPLETION',Applications=[{'Name':'Spark'},{'名称':'Hadoop'},{'名称':'蜂巢'},{'名称':'色调'}]实例={'KeepJobFlowAliveWhenNoSteps':错误,'Ec2KeyName': '您的密钥名称-这里','Ec2SubnetId': '您的子网 ID','InstanceFleet':[{'Name': '主节点','InstanceFleetType': 'MASTER','TargetOnDemandCapacity': 1,'InstanceTypeConfigs':[{'InstanceType': 'c4.xlarge'}]}, {'名称': '核心节点','InstanceFleetType': '核心','TargetOnDemandCapacity': 1,'InstanceTypeConfigs':[{'InstanceType': 'r5.2xlarge',EbsConfiguration":{EbsBlockDeviceConfigs":[{体积规格":{SizeInGB":64,卷类型":gp2"},VolumesPerInstance":1}]}}]}, {'Name': '任务节点','InstanceFleetType': 'TASK','TargetSpotCapacity': 100,'InstanceTypeConfigs':[{'InstanceType': 'r5.2xlarge','BidPriceAsPercentageOfOnDemandPrice':50,'加权容量':16,EbsConfiguration":{EbsBlockDeviceConfigs":[{体积规格":{SizeInGB":32,卷类型":gp2"},VolumesPerInstance":1}]}}, {'InstanceType': 'r5.4xlarge','BidPriceAsPercentageOfOnDemandPrice':50,'加权容量':40,EbsConfiguration":{EbsBlockDeviceConfigs":[{体积规格":{SizeInGB":64,卷类型":gp2"},VolumesPerInstance":1}]}}]}]})返回 cluster_id['JobFlowId']
This documentation shows the default volume sizes based on the instance size: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html
My question is how do I specify the volume size to be bigger when starting up the cluster.
Currently, I'm manually changing it from the EMR page after the cluster is up and running:
You can specify the VolumeSpecification
JSON to get this done. I have not tried this for the master node. I had used it for the core node and task node, But I believe this concept can be extended to the master node as well.
The fields inside the VolumeSpecification
JSON are self-explanatory, So I am not adding their explanation here. You can read them here VolumeSpecification explanation
I am adding a code snippet that can help you how do we exactly use this configuration.
I am using the standard boto3 library
in my code. I have a lambda function that spawns the EMR cluster, but having a lambda function to spawn EMR is, not a must, and you can choose your own alternative.
The code snippet is:
from datetime import datetime
import boto3
'''
This code snippet is used to create an EMR cluster.
'''
def create_emr_cluster(event, context):
conn = boto3.client("emr")
today = datetime.today().strftime('%Y-%m-%d')
cluster_id = conn.run_job_flow(
Name='Your_EMR_name',
ServiceRole='EMR_DefaultRole',
JobFlowRole='EMR_EC2_DefaultRole',
VisibleToAllUsers=True,
LogUri='s3://your-s3-path-where-you-want-cluster-logs/%s/' % today,
ReleaseLabel='emr-5.17.0',
ScaleDownBehavior='TERMINATE_AT_TASK_COMPLETION',
Applications=[{'Name': 'Spark'},
{'Name': 'Hadoop'},
{'Name': 'Hive'},
{'Name': 'Hue'}]
Instances={
'KeepJobFlowAliveWhenNoSteps': False,
'Ec2KeyName': 'your-key-name-here',
'Ec2SubnetId': 'your-subnet-id',
'InstanceFleets': [
{'Name': 'Master Node',
'InstanceFleetType': 'MASTER',
'TargetOnDemandCapacity': 1,
'InstanceTypeConfigs': [{
'InstanceType': 'c4.xlarge'
}]
}, {
'Name': 'Core Node',
'InstanceFleetType': 'CORE',
'TargetOnDemandCapacity': 1,
'InstanceTypeConfigs': [{
'InstanceType': 'r5.2xlarge',
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"SizeInGB": 64,
"VolumeType": "gp2"
},
"VolumesPerInstance": 1
}
]
}
}]
}, {
'Name': 'Task Nodes',
'InstanceFleetType': 'TASK',
'TargetSpotCapacity': 100,
'InstanceTypeConfigs': [{
'InstanceType': 'r5.2xlarge',
'BidPriceAsPercentageOfOnDemandPrice': 50,
'WeightedCapacity': 16,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"SizeInGB": 32,
"VolumeType": "gp2"
},
"VolumesPerInstance": 1
}
]
}
}, {
'InstanceType': 'r5.4xlarge',
'BidPriceAsPercentageOfOnDemandPrice': 50,
'WeightedCapacity': 40,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"SizeInGB": 64,
"VolumeType": "gp2"
},
"VolumesPerInstance": 1
}
]
}
}]
}]
}
)
return cluster_id['JobFlowId']
这篇关于启动集群时如何指定主节点的磁盘空间(卷大小)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!