Configuration Considerations (ESX): Difference between revisions

From vwiki
Jump to navigation Jump to search
(→‎Path Selection Policy (PSP): Added "Round Robin IOPS Load Balancing")
m (→‎Round Robin IOPS Load Balancing: Improved wording and layour)
 
Line 61: Line 61:


=== Round Robin IOPS Load Balancing ===
=== Round Robin IOPS Load Balancing ===
The number of IOs that an ESX will use a path for, before switching to an alternate path to balance the load (so IOPS means ''IO operations'' in this instance rather than ''IOs per second'' as it normally means).
{| class="vwiki-boxout"
|-
| '''Round Robin IOPS''' - The number of IOs that an ESX will use a path for, before switching to an alternate path, in order to balance the load.
|-
| (So in this instance IOPS means IO operations rather than IOs per second as it normally means).
|}


Whether or not to change this is a contentious issue, the out of the box default is 1000. HP state that when using their EVA storage systems you should set IOPS to 1, and some other vendors appear to use IOPS=1 in their own testing (eg EMC).
Whether or not to change this is a contentious issue.  The ''out of the box'' default is 1000 IOs per path. HP state that when using their EVA storage systems you should set IOPS to 1 <ref>HP EVA IOPS to 1 - http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA1-2185ENW&cc=us&lc=en</ref>, and some other vendors appear to use IOPS=1 in their own testing. My personal feeling is to never change any setting from the default unless you have good reason to. The more you change, the more you move away from an expected configuration, and the more chance you have of exposing unexpected flaws and bugs, and the less chance a VMware support guy or gal will have of being able to help to resolve your problem quickly.
My personal feeling is to never change any setting from the default unless you have good reason to. The more you change, the more you move away from an expected configuration, and the more chance you have of exposing unexpected flaws and bugs, and the less chance a VMware support guy or gal will have of helping your resolve a problem quickly.


I'm not convinced that the results show a significant improvement in performance by changing the value, and where there is, you need to remember that these are isolated tests, increasing the rate of path switching increases ESX CPU usage, so will be at the detriment of other performance metrics. So would I would not change as a default, but if you are experiencing performance problems its worth considering
I'm not convinced that the results show a significant improvement in performance by changing the value, and where they do, you need to remember that these are isolated tests and needn't be representative of what you'd experience in your environment.  Increasing the rate of path switching increases ESX CPU usage (albeit to a small degree), so will be at the detriment of other performance metrics. So would I would not change the setting as part of a default ESX build configuration, but if experiencing storage performance problems its worth considering.


Some further reading...
References and further reading...
<references />
* http://virtualgeek.typepad.com/virtual_geek/2010/03/understanding-more-about-nmp-rr-and-iooperationslimit1.html
* http://virtualgeek.typepad.com/virtual_geek/2010/03/understanding-more-about-nmp-rr-and-iooperationslimit1.html
* http://www.yellow-bricks.com/2010/03/30/whats-the-point-of-setting-iops1/
* http://www.yellow-bricks.com/2010/03/30/whats-the-point-of-setting-iops1/

Latest revision as of 09:32, 14 May 2012

Configuration Considerations

Hardware

CPU

Feature Set to Intel name AMD name
Node Interleaving Disabled (allows NUMA operation)
Execute Protection Enabled eXecute Disable (XD) No-Execute Page-Protection
Virtualisation assist Enabled Intel VT AMD-V

CPU Power vs Performance

If in doubt put server BIOS settings to maximum performance - this ensures that ESX can get the most out of the hardware, allowing the BIOS to balance or use low power modes may impact VM performance. ESX's are expected to work hard, that's how they save you money, and so they should be set-up to be able to perform. In theory, allowing the motherboard to throttle back the CPUs when under low load shouldn't cause a problem.

When using ESX4.1 or higher then set the BIOS to allow the OS (ie ESX) control of CPU performance (if the setting is available), this allows the CPU Performance to be controlled dynamically by ESX as it manages VM load (and configurable through the VI Client).

See VM KB 1018206 - Poor virtual machine application performance may be caused by processor power management settings for further info

HP ASR

Should be disabled.

VMware don’t recommend that we use the HP ASR feature (designed to restart a server in the case of an OS hang), they’ve come across occasions when an ESX under load will suddenly restart due to ASR time-outs. See VM KB 1010842 - HP Automatic Server Recovery in a VMware ESX Environment for further info.

Networking

Beacon Probing

Should only be used when there are 3 or more physical NIC's assigned to the vSwitch, uplinked to the network switch.

This is to enable the ESX to be able to properly determine the state of the network during a faulty condition. If there's only two uplinks and the beacon gets lost between the two NIC's, then the ESX can't know which uplink is faulty, just that there is a fault.

See VM KB 1005577 - What is beacon probing? for further info.

Storage

ESX Installation Sizing

See VM KB 1026500 - Recommended disk or LUN sizes for VMware ESX/ESXi installations

SCSI Resets

When accessing centralised storage via SCSI, VMware recommends the following configuration (only the disabling of SCSI Device Resets is a change from the default). These settings are intended to limit the scope of SCSI Resets, and so reduce contention and overlapping of SCSI commands from different hosts accessing the same storage system.

  • Disk.UseLunReset set to 1
  • Disk.UseDeviceReset set to 0

Path Selection Policy (PSP)

  • Active-Active (AA) - Storage array allows access to to LUN's through all paths simultaneously.
  • Active-Passive (AP) - Storage array allows access to to LUN's through one storage processor at a time
  • Asymmetric (ALUA) - Storage array prioritises paths available to access a LUN (See http://www.yellow-bricks.com/2009/09/29/whats-that-alua-exactly/)
Policy For Arrays Description
Most Recently Used (VMW_PSP_MRU) All (default for AP arrays) ESX uses whatever path is available, initially defaulting to last used or first detected at start up
Fixed (VMW_PSP_FIXED) Active-Active (not for AP) ESX uses preferred path, unless its not available. Can cause path thrashing with AP arrays
Fixed AP (VMW_PSP_FIXED_AP) All (though really for ALUA) As for Fixed, but the ESX picks the preferred path, and uses path-thrashing avoidance algorithm
Round Robin (VMW_PSP_RR) All ESX uses all available paths (will be limited by AP arrays)

Round Robin IOPS Load Balancing

Round Robin IOPS - The number of IOs that an ESX will use a path for, before switching to an alternate path, in order to balance the load.
(So in this instance IOPS means IO operations rather than IOs per second as it normally means).

Whether or not to change this is a contentious issue. The out of the box default is 1000 IOs per path. HP state that when using their EVA storage systems you should set IOPS to 1 [1], and some other vendors appear to use IOPS=1 in their own testing. My personal feeling is to never change any setting from the default unless you have good reason to. The more you change, the more you move away from an expected configuration, and the more chance you have of exposing unexpected flaws and bugs, and the less chance a VMware support guy or gal will have of being able to help to resolve your problem quickly.

I'm not convinced that the results show a significant improvement in performance by changing the value, and where they do, you need to remember that these are isolated tests and needn't be representative of what you'd experience in your environment. Increasing the rate of path switching increases ESX CPU usage (albeit to a small degree), so will be at the detriment of other performance metrics. So would I would not change the setting as part of a default ESX build configuration, but if experiencing storage performance problems its worth considering.

References and further reading...