ESX Patching (xTF)

From vWiki
Jump to navigation Jump to search


ESX Baseline Roll-outs

Ad-hoc (probably bi-yearly) rollouts of all applicable updates to ESX's. Baselines are used to ensure a uniform rollout to test and production.

  1. Before any rollout, create a baseline called something like ESX baseline (dd Mmm yy).
  2. Attach the baseline to the root Hosts and Clusters folder in VI Client.
  3. Use a scheduled task to Scan against all ESX's.
  4. Roll-out to test ESX's at the earliest opportunity to test the roll-out
  5. Roll-out to all other non-Production ESX's on a convenient weekend
  6. Roll-out to Production ESX's in a staggered fashion once happy with updates

Jun 08 Baseline Rollout

  • Upgrade to ESX v3.5 update 1 plus patches
  • Upgrade of Open Manage to v5.4
  • All current DELL firmware upgrades

Dec 08 Baseline Rollout

  • Upgrade to ESX v3.5 update 3 plus patches

ESX v3.5 onwards

There's two methods of installing patches with ESX v3.5 onwards, Update Manager is automated but has had serious bugs, where as the more manual Patch Depot method allows the greatest level of control.

Update Manager

Update Manager downloads available patches from VMware on a weekly basis (see Scheduled Tasks in VI Client). Once downloaded they can be attached to a baseline, and the baseline attached to a container within the VI Client. Another scheduled task can then scan all ESX's against the baseline to find applicable updates.

Due to a fault with esxupdate, baselines are ignored when rolling out to ESX's prior to v3.5 update 2. ANY baseline rollout will apply all available patches to the ESX regardless of what's set in the baseline config.

Patch Depot

Manual alternative to Update Manager, see vi3_35_25_esxupdate.pdf

  1. Download all required patches to FTP server folder on vcentre
    1. Download zip's from VMware patch download
    2. Verify zips are valid
      • If you've WinZip command line extensions installed, use wzunzip -t *.zip | find /V "OK"
    3. Extract zips into FTP folder
      • For example .\ftproot\esx350-Jun08 for a Jun 08 rollout of ESX v3.5 patches
    4. Verify the patches from a test ESX
      • Enable firewall esxcfg-firewall -e ftpClient
      • Run test esxupdate -d ftp://vcentre/esx350-Jun08 --test update
      • Disable firewall esxcfg-firewall -d ftpClient
  2. Apply patches to ESX server
    1. Ensure ESX is in Maintenance Mode
    2. Enable FTP client access to depot
      • esxcfg-firewall -e ftpClient
    3. Use esxupdate to apply all patches from the depot
  3. Final clear up
    1. Disable FTP client access
      • esxcfg-firewall -d ftpClient
    2. Flush the local ESX FTP cache
    3. For example esxupdate -d ftp://vcentre/esx350-Jun08 --flushcache scan

ESX v3.5 prior

Automated Script Patch Deployment

Patches are applied to the ESX by running a script. This script connects to an FTP server on vCentre and installs the patches for the relevant ESX version, as dictated by the contents of the patchlist.txt file in the relevant directory (eg C:\inetpub\ftproot\3.0.1\patchlist.txt for ESX v3.0.1)

Prep for roll-out

  1. Download the patches to the relevant ftproot folder
  2. Edit the patchlist.txt to set-up the install order (the order is important)

Install patches on an ESX

  1. Copy the update script to the ESX server
    • EG pscp C:\Software_Repository\esx-autopatch.pl user@esx:/home/user
  2. Put the ESX into maintenance mode
  3. Give the file execute permissions
    • chmod +x esx-autopatch.pl
  4. The ESX must be in maintenance mode for the patches to be installed. If the ESX is not connected to the Infrastructure Client (VCP) use the Service Console Commands to put the ESX in maintenance mode.
  5. Run the update script
    • ./esx-autopatch.pl
  6. Reboot the ESX
    • init 6

The ESX has to be rebooted for patches to take effect, once this has been done, re-running the script shouldn't re-install any patches that have been successfully installed.

md5 errors seem to occur irregularly, suspect its due to some corruption when the update is ftp'ed, reattempt and all should be fine.

Manual Patch Deployment

Used to apply one off or non standard patches.

Prep for installation

  1. Download the patch to the NFS_Share on the Virtual Centre server (note down the md5 hash)
  2. Copy the patch to the ESX server, eg
    • pscp c:\NFS_Share\esx-upgrade-from-esx3-3.0.2-61618.tar.gz user@esx:/home/user/
  3. From the ESX server, confirm the file hasn't been corrupted by checking its md5 hash matches that displayed on the VMware download page, eg
    • md5sum esx-upgrade-from-esx3-3.0.2-61618.tar.gz
    • 43b3617c401e71622c72b10cfcdbc5fe esx-upgrade-from-esx3-3.0.2-61618.tar.gz

Install patch

  1. Extract the patch, eg
    • tar -xzf esx-upgrade-from-esx3-3.0.2-61618.tar.gz
  2. Change directory to where the update extracted to, and run update,
    • esxupdate -n update
  3. Patch should proceed to install, and complete with a success notice,
    • Install succeeded - please come again.
  4. Restart the ESX
    • init 6
  5. Especially if the update was large, delete the update files / folders
    • rm -r -f 60618 delete folder 60618 and contents
    • rm esx-upgrade-from-esx3-3.0.2-61618.tar.gz


Patching Info

ESX Server 3.0.1

ESX-2158032

Date Type md5 Status
30/11/06 Critical c688275383addb789af1885ef4632b5f Deployed
  • Customers using VMotion to hot migrate VMs among servers with AMD Opteron processors might experience a VMkernel crash (PSOD) on the source server after a virtual machine is migrated between servers with AMD Opteron processors.
  • This patch fixes an extreme case in migration where the destination server is sending a checkpoint reply to the source, but the source is not expecting one. The source's tcp receive buffer will slowly fill with unread data, slowing down migrations and eventually causing the VMotion migration to fail.


ESX-1410076

Date Type md5 Status
30/11/06 Critical 7208b58046546b11593a38e5ce9f23b8 Deployed
  • Virtual machines running Red Hat Enterprise Linux 3, Update 5 (RHEL3 U5) may hang and become unresponsive, requiring a PID kill of the virtual machine.
  • Installation of 64-bit guest operating systems may hang before completion of the installation process.
  • Some virtual machines running 64-bit guest operating systems may experience unexpected panics during operation


ESX-1006511

Date Type md5 Status
30/11/06 Critical efa86b4e30e7700e186c8040fde93381 Deployed
  • Some Ethernet controllers using the e1000 driver may experience a tx hang when used with ESX Server 3.0.1.
  • After the Intel Pro1000 PT Quad Port adapter is installed, the ESX Server host does not detect the network adapter. The network adapter appears as "Intel Corporation: Unknown device 10a4" at the lspci command output in the service console. When viewing network adapters in VI Client, the Intel Pro1000 PT Quad Port adapter does not appear in the list of network adapters attached to the ESX Server host even though it is connected


ESX-8173580

Date Type md5 Status
28/12/06 General 1a4f3e57d1b950dec8401074aaa92490 Deployed
  • An issue which can cause a service console crash when running with Dell OpenManage 5 with a QLogic Fiber Channel controller.
  • An issue where an overflow of a statistic in the TCP/IP stack can cause an ESX Server host crash when using volumes with NFS.


ESX-2066306

Date Type md5 Status
28/12/06 Critical 2b8a9a6d9beb82476e1a7e8eafbb18d7 Deployed
  • Virtual machines experiencing high cpu load during a VMotion migration can hang after the migration is complete.
  • Virtual machines can crash during a NUMA migration due to memory allocation failures.
  • Kernel memory can become corrupted, resulting in a kernel crash when using 64-bit guest operating systems in virtual machines on ESX Server hosts with AMD processors.
  • Adds support for using Microsoft Clustering Server (MSCS) with Windows 2003 Service Pack 1 (SP1) and R2 Guest Operating Systems.


ESX-3199476

Date Type md5 Status
05/03/07 Critical a12c77bd49c65f7a333dd8e70a6ec729 Deployed
  • This patch fixes an issue where the vmxnet driver may cause a Windows virtual machine to crash.
  • This patch is mandatory for using Microsoft Clustering Service (MSCS) with Windows 2003 Service Pack 1(SP1) and R2 Guest Operating Systems.


ESX-2257739

Date Type md5 Status
29/03/07 Critical e49ae9be1c51fef5db641e0654a43117 Deployed
  • This patch fixes an issue where setting an invalid MAC address within a virtual machine's guest operating system can cause a crash of the ESX Server host and possibly incur a denial of service to occur on the ESX Server host.


ESX-1541239

Date Type md5 Status
29/03/07 Critical e8e83b996daedd967ed033f1c1292a3d Deployed
  • This patch fixes an issue where some storage targets might not be discovered properly in a host with an Emulex HBA. The patch also fixes a delay during Emulex driver loading.


ESX-7557441

Date Type md5 Status
15/05/07 Critical 2a9b7ea008d51a9ac746b3c32ea36ccf Deployed
  • This patch fixes an issue where restarting the mgmt-vmware service can cause an unexpected reboot of virtual machines that are configured to automatically start or stop.


ESX-7302867

Date Type md5 Status
15/05/07 Critical a3449ef90ed8f146596c9dac27f88d41 Deployed
  • An issue where an ESX Server panic can occur during a vm-support command after removing a USB drive from the host.
  • Updates the aacraid_esx30 driver to fix a condition that may cause an ESX Server console operating system panic when the corresponding device's /proc node is accessed. One example of such operation is during "vm-support" log collection.


ESX-4825991

Date Type md5 Status
15/05/07 Critical 083290f1b0753a70cff81553e7cfd069 Deployed
  • A fix for a race condition upon virtual machine world removal (typically happens during virtual machine power off) and creation (typically happens during virtual machine power on) which causes the ESX Server host to panic.
  • Enhancements to improve the performance of ESX Server console operating system network interface (vswif).


ESX-1000070

Date Type md5 Status
09/07/07 Critical 56931d15ac3c186f80c1b20d855fa943 Deployed
  • This patch resolves time loss issue in Microsoft Windows 2000 SMP (multi-processor) virtual machines running applications that use the Windows Multimedia Timer service. Java applications typically fall into the category of applications that experience the time loss without this patch. The fix does not affect single-processor virtual machines.
  • A fix for an issue where using the 32-bit version of VMware tools in a 64-bit Solaris 10 virtual machine causes high CPU usage. Users of this patch on ESX Server hosts with Solaris 10 virtual machines with 64-bit VMware tools will still experience an error stating "vmware-toolbox-gtk: 1816): Gtk-WARNING **" since no 64-bit GTK is available, but will not experience the high CPU usage. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6456279 for more information on Solaris 10 and the GTK for 64-bit applications.
  • Pre-built modules (PBMs) to enable VMware tools to be installed in Red Hat Enterprise Linux (RHEL) 3 AS 64-bit virtual machines. Without this patch, the error "None of the pre-built vmmemctl modules for VMwaretools are suitable for your running kernel." is seen when attempting to install VMware tools in RHEL 3 AS 64-bit virtual machines.
  • Pre-built modules (PBMs) to enable VMware tools to be installed in SUSE Linux Enterprise Server (SLES) 8 Kernel v2.4.21-306 and higher virtual machines.


ESX-1000039

Date Type md5 Status
09/07/07 Critical a47b942424bebe15a2ffab2df5f6a854 Deployed
  • Fixes an issue where an ESX Server host experiences boot failures because the software doesn't correctly handle failed block driver I/Os.
  • Fixes an issue where extraneous warnings are logged in /var/log/messages of ESX Server hosts running HP Insight Manager version 7.7. The warning message is the following: vmkernel: 0:00:10:43.873 cpu0:1024)VMNIX: VmkDev 3293: dev=vmnic2 cmd=0x8946 ifr=dedc1eb4 status=0x2bacffa1.
  • Fixes an issue where some ESX Server hosts stop responding for between two and five minutes during a rescan operation and on occasion, the host will completely stop responding. The diagnosis of this issue is described in detail in KB 10229. Please refer to that knowledgebase article for more information on the diagnosis of the problem and for workarounds if you do not plan to apply this patch to fix the issue.
  • Fixes an issue where NetApp's sanlun utility fails to get the required information from NetApp filers.
  • Fixes an issue found on HP Proliant DL 380 G5 systems where valid I/O requests were marked as attempts to overwrite the partition table. This caused the file system to become "read-only" while the ext3 file-system is configured on block devices that have extended partitions.


ESX Server 3.0.2

ESX-1002424

Date Type md5 Status
15/11/07 Critical 6b666d525062b5ccc8bbb5b09fbcebfb Download
  • Fixes an issue where a malformed IP packet might cause the ESX Server host to stop responding. This fix checks IP header information, and rejects malformed IP packets.
  • Fixes an issue where the ESX Server host might stop responding due to vmklinux heap allocation failure. vmklinux is a module loaded on top of vmkernel that creates its own heap for memory allocation/free requests from within the module. During module creation, the kernel can be set up for minimum and maximum heap sizes of 256KB and 20MB. Based on the type of heap this module creates (low memory heap), fragmentation of large pages (physical pages of 2MB) might prevent vmkernel from increasing the heap size. This in turn might result in memory allocation requests failing within vmklinux module. The fix avoids these failures by making vmkernel grow the vmklinux heap to 10MB early on during boot time, when there shouldn't be any large page fragmentation.
  • Fixes an issue of incorrect display of the hardware system manufacturer and product name on the guest operating system. Adding SMBIOS.reflectHost=TRUE option in the virtual machine configuration file should display the host system manufacturer and product name on the guest operating system as well. Instead, the guest operating system might display "VMware."
  • Fixes an issue where some firmware versions of IBM SVC storage array return a status of Target Not Ready during failover, whenever IBM SVC LUN(s) are un-presented. A flaw in handling the return status by the ESX Server host's IBM SVC storage array specific multi-pathing code might cause the ESX Server host environment to become unresponsive. This fix modifies the multi-pathing code to update the Target Not Ready time for a path whenever the status is received, and marks the path as closed.
  • Fixes an issue where a Reverse Address Resolution Protocol (RARP) broadcast might be sent to multiple vmnics. The ESX Server host normally registers a virtual machine with a switch through a transmission of a RARP broadcast packet that contains the MAC address of the virtual machine. In a topology of NIC teams connected to two or more switches, the virtual machine does not register with the switch. For this reason, the RARP packets are sent through multiple interfaces. This works for a single switch, as a Content-Addressable Memory (CAM) table can only update one entry at atime. However, when ESX Server NIC teams are connected to a cascade switch, a race condition leads to an attempt to update the CAM table of both switches with conflicting entries, with the result that both entries are discarded, leaving the virtual machine unregistered with the switch.
  • Provides a new mechanism to support OEM Windows Server 2003 System Locked Preinstallation (SLP). Virtual machines running an OEM version of Microsoft Windows Server 2003 might prompt for reactivating the operating system when the virtual machine is migrated through a VMotion operation to a different ESX Server host with or without a different release version. This fix does away with the requirement to reactivate the operating system.
  • Fixes an issue where vmkping and ramcheck commands fail in ESX Server 3.0.2 Update 1 or ESX Server 3.0.2 with patch bundle ESX-1001907 installed.


ESX-1002425

Date Type md5 Status
15/11/07 General 0837108e05a45f07245c65a3059bb26d Download
  • Fixes an issue where the ESX Server host in maintenance mode cannot be powered off using the Virtual Infrastructure Client. Shutting down the ESX Server host in maintenance mode on the VI Client brings the host to a halt state.
  • Fixes an issue where an incomplete SOAP message sent to the Web services API might result in dramatically increased CPU usage in ESX Server host or VirtualCenter Server.


ESX-1002435

Date Type md5 Status
03/12/07 General 0b48742e713e8ee86d1e81adfc06984a Download
  • Fixes an issue where setting the appropriate NIC teaming policy might result in an error message being displayed.

Fixes an issue where the Customization Specification Manager wizard in the VI Client might fail if a non-default timezone is configured without the INDEX key in the Windows registry.

  • Fixes an issue where the Cost column in the VirtualCenter License Source page might display only "1/" for the virtual machine license feature. The cost unit Virtual Machine might not be displayed.
  • Fixes an issue where answering the Redo Log question might close any additional instance of the same dialog box.


ESX-1002426

Date Type md5 Status
03/12/07 General 41f6a6790b448026274b84412cde917c Download
  • This patch fixes an issue where the VMware vdf utility displays incorrect results on the service console for the volume name, when the symbolic link to VMFS volumes has a space in it. This fix corrects the regular expression to print the correct volume names with space or without space.


ESX-1002430

Date Type md5 Status
03/12/07 Critical 0dca6bb53703fe42c709d4849d8194bc Download
  • Fixes an issue where pcnet32 or vlance drivers are loaded instead of vmxnet drivers after VMware Tools is installed.
  • Fixes an issue where compilation failed when including GuestSDK header files, as the file includeCheck.h referenced by GuestSDK headers was missing from 64-bit Windows Tools installation. This fix adds includeCheck.h to the GuestSDK install.


ESX-1002974

Date Type md5 Status
01/02/07 General caee04b0fcf1aefebc6134a95db8082f Download
  • A SCSI reservation conflict issue when synchronous commands such as SCSI Reserve/Release are retried by the VMkernel storage stack. If a failover happens when SCSI Reserve/Release are retried, SCSI Reserve/Release will be issued on the failed path, and I/Os issued on the active path will complete with SCSI Reservation conflicts.
  • An issue where the ESX Server host stops responding while attempting to transmit pending packets from a deferred queue.
  • An issue where the ESX Server host stops responding when EMC Invista Brocade is rebooted.
  • When booting from SAN, where there are multiple paths to the boot LUN, if the boot LUN disappears from the primary path (for example, due to a hardware failure) the service console might not fail-over to the secondary path and goes read only.


ESX-1xxxxx

Date Type md5 Status
xx/xx/07 Gxxxl xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Download