Google Compute实例不会挂载永久磁盘,可维持约100%的CPU [英] Google Compute instance won't mount persistent disk, maintains ~100% CPU

查看:108
本文介绍了Google Compute实例不会挂载永久磁盘,可维持约100%的CPU的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在例行使用我的Web服务器期间(通过WordPress保存帖子),我的实例突然跳升至400%CPU使用率,而不会回落到100%以下.重新启动和停止/启动实例并没有任何改变.

During some routine use of my web server (saving posts via WordPress), my instance suddenly jumped up to 400% CPU usage and wouldn't come back down below 100%. Restarting and stopping/starting the instance didn't change anything.

查看我的串行输出的最后一位:

Looking at the last bit of my serial output:

[    0.678602] md: Waiting for all devices to be available before autodetect
[    0.679518] md: If you don't use raid, use raid=noautodetect
[    0.680548] md: Autodetecting RAID arrays.
[    0.681284] md: Scanned 0 and added 0 devices.
[    0.682173] md: autorun ...
[    0.682765] md: ... autorun DONE.
[    0.683716] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
[    0.685298] Please append a correct "root=" boot option; here are the available partitions:
[    0.686676] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[    0.688489] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-30-generic #34~14.04.1-Ubuntu
[    0.689287] Hardware name: Google Google, BIOS Google 01/01/2011
[    0.689287]  ffffea00008ae400 ffff880024ee7db8 ffffffff817af477 000000000000111e
[    0.689287]  ffffffff81a7c6c0 ffff880024ee7e38 ffffffff817a9338 ffff880024ee7dd8
[    0.689287]  ffffffff00000010 ffff880024ee7e48 ffff880024ee7de8 ffff880024ee7e38
[    0.689287] Call Trace:
[    0.689287]  [<ffffffff817af477>] dump_stack+0x45/0x57
[    0.689287]  [<ffffffff817a9338>] panic+0xc1/0x1f5
[    0.689287]  [<ffffffff81d3e5f3>] mount_block_root+0x210/0x2a9
[    0.689287]  [<ffffffff81d3e822>] mount_root+0x54/0x58
[    0.689287]  [<ffffffff81d3e993>] prepare_namespace+0x16d/0x1a6
[    0.689287]  [<ffffffff81d3e304>] kernel_init_freeable+0x1f6/0x20b
[    0.689287]  [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[    0.689287]  [<ffffffff8179fab0>] ? rest_init+0x80/0x80
[    0.689287]  [<ffffffff8179fabe>] kernel_init+0xe/0xf0
[    0.689287]  [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
[    0.689287]  [<ffffffff8179fab0>] ? rest_init+0x80/0x80
[    0.689287] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    0.689287] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

(不确定是否很明显,但我使用的是标准的Ubuntu 14.04映像)

(Not sure if it's obvious from that, but I'm using the standard Ubuntu 14.04 image)

我尝试拍摄快照并将其安装在新实例上,现在我甚至删除了该实例并将磁盘安装到新的实例上,仍然是相同的问题,并且串行输出完全相同.

I've tried taking snapshots and mounting them on new instances, and now I've even deleted the instance and mounted the disk on to a new one, still the same issue and exactly the same serial output.

我真的希望我的数据没有被绝望地破坏.不确定是否有人对从永久磁盘恢复数据有任何建议?

I really hope my data has not been hopelessly corrupted. Not sure if anyone has any suggestions on recovering data from a persistent disk?

请注意以下可接受的答案:

Note that the accepted answer for: Google Compute Engine VM instance: VFS: Unable to mount root fs on unknown-block did not work for me.

推荐答案

我在另一个问题上发布了此问题,但这个问题的措词更好,所以我将其重新发布在这里.

I posted this on another question, but this question is worded better, so I'll re-post it here.

这是一百万美元的问题.检查完我的GCE VM之后,我发现安装了14个不同的内核,占用了数百MB的空间.大多数内核没有相应的 initrd.img 文件,因此无法启动(包括3.19.0-39-generic).

That is the million dollar question. After inspecting my GCE VM, I found out there were 14 different kernels installed taking up several hundred MB's of space. Most of the kernels didn't have a corresponding initrd.img file, and were therefore not bootable (including 3.19.0-39-generic).

我当然从来不会尝试安装随机内核,一旦删除,它们将不再显示为可用升级,因此我不确定会发生什么.真的,发生了什么事?

I certainly never went around trying to install random kernels, and once removed, they no longer appear as available upgrades, so I'm not sure what happened. Seriously, what happened?

来自Google Cloud Support的新回复.

我收到了另一个令人不安的回复.这也许可以解释其他错误的内核.

I received another disconcerting response. This may explain the additional, errant kernels.

在极少数情况下,需要将VM从一台物理主机迁移到另一台物理主机.在这种情况下,Google可能会应用内核升级和安全补丁."

"On rare occasions, a VM needs to be migrated from one physical host to another. In such case, a kernel upgrade and security patches might be applied by Google."

如何恢复您的实例...

几封来回电子邮件后,我终于收到了支持人员的回复,使我得以解决此问题.请注意,您将不得不更改一些东西以匹配您的唯一VM.

How to recover your instance...

After several back-and-forth emails, I finally received a response from support that allowed me to resolve the issue. Be mindful, you will have to change things to match your unique VM.

  1. 首先获取磁盘快照,以防我们需要回滚以下任何更改.

  1. Take a snapshot of the disk first in case we need to roll back any of the changes below.

编辑损坏实例的属性以禁用此选项:删除实例时删除启动磁盘"

Edit the properties of the broken instance to disable this option: "Delete boot disk when instance is deleted"

删除损坏的实例.

启动一个新的临时实例.

Start up a new temporary instance.

将损坏的磁盘(将显示为/dev/sdb1)连接到临时实例

Attach the broken disk (this will appear as /dev/sdb1) to the temporary instance

启动临时实例后,请执行以下操作:

When the temporary instance is booted up, do the following:

在临时实例中:

# Run fsck to fix any disk corruption issues
$ sudo fsck.ext4 -a /dev/sdb1

# Mount the disk from the broken vm
$ sudo mkdir /mnt/sdb
$ sudo mount /dev/sdb1 /mnt/sdb/ -t ext4

# Find out the UUID of the broken disk. In this case, the uuid of sdb1 is d9cae47b-328f-482a-a202-d0ba41926661
$ ls -alt /dev/disk/by-uuid/
lrwxrwxrwx. 1 root root 10 Jan 6 07:43 d9cae47b-328f-482a-a202-d0ba41926661 -> ../../sdb1
lrwxrwxrwx. 1 root root 10 Jan 6 05:39 a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 -> ../../sda1

# Update the UUID in grub.cfg (if necessary)
$ sudo vim /mnt/sdb/boot/grub/grub.cfg

注意:这是我偏离支持说明的地方^ .

Note: This ^^^ is where I deviated from the support instructions.

我没有修改所有引导项以设置root=UUID=[uuid character string],而是寻找了设置root=/dev/sda1的所有项并删除了它们.我还删除了所有未设置initrd.img文件的条目.在我的情况下,带有正确参数的最上面的引导条目最终是 3.19.0-31-generic .但是您的可能会有所不同.

Instead of modifying all the boot entries to set root=UUID=[uuid character string], I looked for all the entries that set root=/dev/sda1 and deleted them. I also deleted every entry that didn't set an initrd.img file. The top boot entry with correct parameters in my case ended up being 3.19.0-31-generic. But yours may be different.

# Flush all changes to disk
$ sudo sync

# Shut down the temporary instance
$ sudo shutdown -h now

最后,将HDD与临时实例分离,并基于 fixed 磁盘创建一个新实例.希望它将启动.

Finally, detach the HDD from the temporary instance, and create a new instance based off of the fixed disk. It will hopefully boot.

假设它确实可以启动,您需要做很多工作.如果您拥有的未使用内核数量是我的一半,那么您可能希望清除未使用的内核(特别是因为某些内核可能缺少相应的initrd.img文件).

Assuming it does boot, you have a lot of work to do. If you have half as many unused kernels as me, then you might want to purge the unused ones (especially since some are likely missing a corresponding initrd.img file).

我在

I used the second answer (the terminal-based one) in this askubuntu question to purge the other kernels.

注意:请确保您不清除启动时使用的内核!

Note: Make sure you don't purge the kernel you booted in with!

这篇关于Google Compute实例不会挂载永久磁盘,可维持约100%的CPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆