如何在Ubuntu上设置AWS CloudWatch的代理以获取(正确的)自定义指标,例如cpu,内存和磁盘使用率% [英] How to setup AWS CloudWatch's agent at Ubuntu to get (correct) custom metrics like cpu, memory and disk usage %

查看:494
本文介绍了如何在Ubuntu上设置AWS CloudWatch的代理以获取(正确的)自定义指标,例如cpu,内存和磁盘使用率%的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个AWS EC2 m5.large(不可爆裂的实例).我已经在仪表板上设置了AWS CloudWatch的默认指标(CPU%)之一和一些自定义指标(内存+磁盘使用率).

I'm running an AWS EC2 m5.large (a none burstable instance). I have setup one of AWS CloudWatch's default metrics (CPU %) + some custom metrics (memory + disk usage) in my dashboard.

但是当我比较CloudWatch报告的数字时,它们与我登录Ubuntu 20.04服务器时的实际使用情况相差甚远...

But when I compare the numbers CloudWatch report to me they are pretty far from then actually usage of the Ubuntu 20.04 server when I log in to it...

实际用法:

CPU: ~ 35 %
Memory: ~ 33 %

CloudWatch报告:

CloudWatch report:

CPU ~ 10 %
Memory: ~ 50-55

https://www.screencast.com/t/o1nAnOFjVZW

我已经按照AWS自己的说明添加了内存和磁盘使用量指标(因为CloudWatch并非开箱即用,可以访问O/S级别的内容):

I have followed AWS own instructions to add the metrics for memory and disk usage (Because CloudWatch doesn't out of the box have access to O/S level stuff): https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html

如果数字彼此之间距离太远,那么就不可能设置有用的警报和通知.我真不敢相信,AWS希望向选择遵循原始说明的人们提供什么? 唯一完全匹配的是磁盘使用率%.

When numbers are so far from each other - then it would be impossible to setup useful alarms and notifications. I can't believe that is what AWS wants to provide to the people who chose to followed their original instructions? The only thing with match exactly is the disk usage %.

推荐答案

如何在UBUNTU 20.04上安装AWS Agent(替代旧脚本的新方法:"CloudWatchMonitoringScripts")

https://docs.aws .amazon.com/AmazonCloudWatch/latest/monitoring/download-cloudwatch-agent-commandline.html

1. sudo wget https://s3.amazonaws.com/amazoncloudwatch-agent/debian/amd64/latest/amazon-cloudwatch-agent.deb
2. sudo dpkg -i -E ./amazon-cloudwatch-agent.deb
3. sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
4. Go through all the steps in the wizard (The result is saved here: /opt/aws/amazon-cloudwatch-agent/bin/config.json)

提示:我回答:

 - Default to most questions and otherwise:
 - NO  --> Do you want to store the config in the SSM parameter store? (Because when I answered YES it failed later on because of some permission-issue and I didn't know how to make it happy and I don't think I need SSM in regards to this)
 - YES --> Do you want to turn on StatsD daemon?
 - YES --> Do you want to monitor metrics from CollectD?
 - NO  --> Do you have any existing CloudWatch Log Agent?

现在要防止此错误:解析/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml时出错,打开/usr/share/collectd/types.db:没有此类文件或目录 https://github.com/awsdocs/amazon-cloudwatch-user- guide/issues/1

Now to prevent this error: Error parsing /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml, open /usr/share/collectd/types.db: no such file or directory https://github.com/awsdocs/amazon-cloudwatch-user-guide/issues/1

5. sudo mkdir -p /usr/share/collectd/
6. sudo touch /usr/share/collectd/types.db
7. sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
8. /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status

{
  "status": "running",
  "starttime": "2020-06-07T10:04:41+00:00",
  "version": "1.245315.0"
}

  1. https://www.screencast.com/t/42VWgoS88Z (创建IAM角色,添加策略并将其设置为服务器的默认角色.
  2. https://www.screencast.com/t/fAUUHCPe (CloudWatch新自定义指标)
  3. https://www.screencast.com/t/8J0Saw0co (数据匹配正常现在)
  4. https://www.screencast.com/t/x0PxOa799 (数据匹配正常现在)
  1. https://www.screencast.com/t/42VWgoS88Z (Create IAM role, add policies and make it the server default role).
  2. https://www.screencast.com/t/fAUUHCPe (CloudWatch new custom metrics)
  3. https://www.screencast.com/t/8J0Saw0co (data match OK now)
  4. https://www.screencast.com/t/x0PxOa799 (data match OK now)

我意识到-我第二次登录计算机时,CPU%的使用率从10%上升到30%并停留在该位置(当然可以预料会有一些增加-但我认为并没有那么多),这在我看来Case较早地解释了巨大的差异...老实说,我现在不知道这种方法是否比旧脚本更精确-但这应该是在2020年实现的正确方法:-)您将获得179个自定义指标在向导中选择高级"时(即使只有少数几个对大多数人有价值)

I realized - that the second I login to the machine the CPU % usage goes up from 10 % to 30% and stays there (of course some increase was to be expected - but not that much in my opinion) which in my case explains the big difference earlier...I honestly don't now if this way in more precise than the older script - but this should be the right way to do it in year 2020 :-) And you get access to 179 custom metrics when selecting "Advanced" during the wizard (even though only few would be valuable to most people)

这篇关于如何在Ubuntu上设置AWS CloudWatch的代理以获取(正确的)自定义指标,例如cpu,内存和磁盘使用率%的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆