Edge Azwan

Friday, June 4, 2010

VMware High Availability (HA)

Insufficient resources to satisfy configured failover level for HA

The above is a common error for those using ESX clusters with HA enabled. It's basically saying that ESX is unable to power on a VM due to violation of the availability constraints (in other words causing the current failover level to become less than the configured failover level).

When strict admission control is used (i.e. the checkbox that says Do not power on virtual machines if they violate availability constraints is ticked) you must ensure the following requirements are satisfied (from vi3_35_25_resource_mgmt.pdf) :
  • HA calculates the maximum memory and CPU reservations needed for any currently powered on virtual machine and calls this a slot. A slot is the amount of CPU and memory resources that will be sufficient for any currently powered on virtual machine (powered off or suspended virtual machines are not considered when calculating the current failover level).
  • HA determines how many slots can “fit” into each host based on the host’s CPU and memory capacity.
  • HA then determines how many hosts could fail with the cluster still having at least as many slots as powered on virtual machines. This number is the current failover level.
As an example, you have a cluster of 2 servers, each with 2x2Ghz quod core CPUs (16GHz in total) and 32GB RAM. The two heaviest VMs are configured as follows:

VM1 - 2GHz CPU reservation and 1GB memory reservation
VM2 - 1GHz CPU reservation and 2GB memory reservation

ESX would define your slot as 2GHz reservation for CPU and 2GB for memory. So each ESX host in the cluster is able to support 16/2=8 CPU slots and 32/2=16 memory slots. (Note: apparently if you use virtual SMP, ESX would multiply the number of vCPU say 4 with the highest CPU reservation found on the cluster!)

With the failover capacity set to "1" (default) the host with the largest number of possible slots is dropped from the calculation, and you're left with the total number of slots available (sum of slots available on each node). In our example because the nodes have identical hardware the result remains unchanged (8 for CPU & 16 for memory), which is not a lot and turning on strict admission control would result in significant resource wastage if reservation is used incorrectly.

Some tips I picked up from work:
  • Use Resource Pool and set reservation to 0, effectively letting ESX worry about allocation of memory and cpu, which it does pretty well dynamically. It's a bad idea micro-managing VM by defining reservation individually as this gets tedious with large number of VMs.
  • For strict admission control to work properly and to avoid the common error mentioned in the subject line the reservation value for each VM must be correctly assigned. In most cases opting instead for Resource Pool is the easiest and often more effective solution.
Don't use virtual SMP CPU (i.e. more than 1 vCPU) unless you have to!

Labels:

just my 2cents at 9:56 PM |

0 Comments:

Post a Comment

<< Home