AWS的EC2的实例是在物理机上虚拟化出来的,当承载EC2的物理机出现问题或者要退役时,AWS会发送一封主题为[Retirement Notification] Amazon EC2 Instance scheduled for retirement的email到AWS Account的注册邮箱中。

以前多次收到过这类邮件,但提示的目标instance-ID都是AutoScaling中运行的EC2,而AutoScaling中的Instance自动伸缩不会影响业务,因此也从没有主动去处理过。

这次突然收到通知邮件,说一台运行后台任务的Instance的underlying hardware hosting出问题了。在此记录一下相关的现象和解决方法。

现象:

  1. 注册邮箱中,收到主题为[Retirement Notification] Amazon EC2 Instance scheduled for retirement的邮件, 提示某个Instnace的底层硬件要退役了,提醒做好对应的处理:

  2. 点击邮件中Events页面的链接登陆EC2 Console,类似为https://console.aws.amazon.com/ec2/v2/home?region=us-west-2#Events

  3. 此时,对应的EC2可能有两种情况: 一是还在正常工作,二是已经不在正常工作了。不幸的是,在这次事件中,虽然Instance State还是处于running, 但是已经无法登陆出错的Instance了。Console中提示如下的信息, 提示Instance已经不可使用了。
    ec2_retirement_event_description.jpeg

  4. 而且Instance的System Status Checks和Instnace Status Checks都处于Failed状态。

解决办法

Instance的root device是EBS时

如果Instance的root device是EBS, 可行的步骤如下:

  1. 直接Stop Instance,然后再Start即可。此时在大多数情况下,instance会迁移到一个新的承载物理机上。官方文档的描述如下:

    In most cases, the instance is migrated to a new underlying host computer when it’s started.

  2. 如果Instance使用了Elastic IP,那么就不需要做任何操作。如果是自动分配的Public IP,则需要注意,stop后再start,Public IP会发生改变。如果原先脚本或者程序中有直接用到原Public IP的地方,需要做对应的修改。

  3. 在实际操作中,先stop 对应的instance,然后再start,instance就可用了。只是期间发现stop instance操作比平时会长很多。可能和正常的stop操作相比,会有额外的数据同步操作。但或许也只是我这边的个例。

Instance的root device是Instance store时

如果Instnace的root device是Instance store的,可行的操作步骤如下:

  1. 先基于现有的Instance建立一个AMI。
  2. 从AMI创建一个新的Instance。
  3. private IP,Public IP地址都会发生改变,如果有用到的地方,需要对应修改。

建议

如果恢复Instance的事情,不是异常紧急的话,建议即使root device是EBS,也先做一个AMI后再执行stop,start操作。这样更靠谱,更保险一点。

其他

邮件示例

主题为[Retirement Notification] Amazon EC2 Instance scheduled for retirement的邮件大致内容如下:

Dear Amazon EC2 Customer,

We have important news about your account (AWS Account ID: 888888888888). EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: i-aaaaaaaaaaaaaaaaa) in the us-west-2 region. Due to this degradation, your instance could already be unreachable. After 2018-03-26 14:00 UTC your instance, which has an EBS volume as the root device, will be stopped.

You can see more information on your instances that are scheduled for retirement in the AWS Management Console (https://console.aws.amazon.com/ec2/v2/home?region=us-west-2#Events)

* How does this affect you?

Your instance will be stopped after the specified retirement date, but you can start it again at any time. Note that if you have EC2 instance store volumes attached to the instance, any data on these volumes will be lost when the instance is stopped or terminated as these volumes are physically attached to the host computer

* What do you need to do?

You can wait for the scheduled retirement date - when the instance is stopped - or stop the instance yourself any time before then. Once the instances has been stopped, you can start the instance again at any time. For more information about stopping and starting your instance, and what to expect when your instance is stopped, such as the effect on public, private and Elastic IP addresses associated with your instance, see Stop and Start Your Instance in the EC2 User Guide (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html).

Reference

  1. AWS forums - Degraded hardware, EC2 instance scheduled for retirement - options
  2. AWS forums - [Retirement Notification] Amazon EC2 Instance scheduled for retirement.
  3. AWS Doc - Instance Retirement
  4. AWS Doc - Stop and Start Your Instance

留言