Virtual Machines: Difference between revisions

From vwiki
Jump to navigation Jump to search
Line 250: Line 250:


=== Can't Start VM ===
=== Can't Start VM ===
'''HA Admission Control'''
==== HA Admission Control ====
* Can't start VM as doing so wouldn't leave enough failover capacity in order to be able to restart failed VM's should an ESX fail.  Options are to
* Can't start VM as doing so wouldn't leave enough failover capacity in order to be able to restart failed VM's should an ESX fail.  Options are to
** Reduce resource usage of VM's that are already running
** Reduce resource usage of VM's that are already running
Line 259: Line 259:
*# Run the '''Reconfigure for HA''' command, this will re-install the HA agent on the ESX
*# Run the '''Reconfigure for HA''' command, this will re-install the HA agent on the ESX


'''Failed to relocate virtual machine'''
==== Failed to relocate virtual machine ====
* DRS is attempting to relocate a VM at power up, and this relocation failing
* DRS is attempting to relocate a VM at power up, and this relocation failing
** Reattempt to power on machine
** Reattempt to power on machine
** Manually migrate to a less loaded ESX and reattempt power on
** Manually migrate to a less loaded ESX and reattempt power on


'''Access to VMFS storage'''
==== Access to VMFS storage ====
* ESX may have lost connectivity to VMFS partition on which VM resides
* ESX may have lost connectivity to VMFS partition on which VM resides


'''VMFS full'''
==== VMFS full ====
* If VMFS is full, the ESX won't be able to write to the VM's logs when it starts it up, causing VM start-up to fail
* If VMFS is full, the ESX won't be able to write to the VM's logs when it starts it up, causing VM start-up to fail


'''ESX licensing'''
==== ESX licensing ====
* Either ESX isn't licensed, or has lost contact with the license server (VI3) for a long period of time
* Either ESX isn't licensed, or has lost contact with the license server (VI3) for a long period of time


'''Waiting for question to be answered'''
==== Waiting for question to be answered ====
* Generally after changes (such as cold migrations or new deployments), a VM may need to have a question answered before it can continue to power on
* Generally after changes (such as cold migrations or new deployments), a VM may need to have a question answered before it can continue to power on


'''Could not power on VM: No swap file. Failed to power on VM'''
==== Could not power on VM: No swap file. Failed to power on VM ====
* The ESX your starting the VM up on can't get proper access the VM's files, either because
* The ESX you're starting the VM up on can't get proper access the VM's files, either because
** The VM is already powered up on another ESX
** The VM is already powered up on another ESX
** The VM's files have been corrupted
** The VM's files have been corrupted / locked
* If the ESX the virtual is/was on has ''failed'' then its likely that only the ESX's network connections have failed, the virtual machines are still running on the ESX, but are isolated from the network.   
 
*# To cause a full HA failover, pull the power cables out of the ESX to kill it completely
If there have been no ESX failures, then the VM's files are probably corrupted.  The VM can be re-registered by removing and re-adding it to the inventory.
*# Alternatively, attempt to restore network connectivity to allow the VM's to br reachable again
 
* If there are no ESX failures, then the VM's files are probably corrupted. The VM needs to be re-registered by removing and re-adding it to the inventory.
If the ESX the virtual is/was on has ''failed'' then its possible that only the ESX's network connections have failed, and the virtual machines are still running on the ESX, but are isolated from the network.   
* To cause a full HA failover, pull the power cables out of the ESX to kill it completely
* Alternatively, attempt to restore network connectivity to allow the VM's to be reachable again
 
For further info see - [http://kb.vmware.com/kb/10051 VMware KB10051 - Virtual machine does not power on because of missing or locked files]


=== Can't VMotion a VM ===
=== Can't VMotion a VM ===

Revision as of 13:18, 30 September 2010

Basic Virtual Machine Tasks

Start / Stop / Bounce a VM

  1. Log into the Virtual Infrastructure - Management Access
  2. Under the Inventory button, ensure Hosts and Clusters is ticked
  3. Highlight the VM you want to affect
  4. Either right-click or use the commands in the right hand pane to Power off, Power on, Reset as required

This is the same as using the Power or Reset buttons on the front of a physical server. It's possible to send Windows shut down etc commands to the VM; right click over the VM and select the appropriate Shut Down Guest, Restart Guest command. This tells VM Tools to attempt to perform the required action, obviously open applications etc can inhibit the successful shutdown of an OS.


Remote Console (KVM like) Access

If possible, its preferable to use normal remote access software (eg RDP, or VNC). This ensures that load caused by remote access is contained within the VM, rather than the ESX.

  1. Log into the Virtual Infrastructure - Management Access
  2. Under the Inventory button, ensure Hosts and Clusters is ticked
  3. Highlight the VM you want and either right click Open Console or use the Open Console command in the right hand pane


CD-ROM Access

There are essentially two ways to present a CD-ROM image to a VM, using an ISO image is by far and away the most flexible. Even if you only have a physical CD and expect to use it once, its still recommended that you create an ISO image from the CD and use that instead. The alternative is to put the physical media into the ESX hosting the VM (use Host Device when adding the CD to the VM).

To present an ISO image to a VM

  1. If its not already there, copy the ISO image to an NFS share or other ESX accessible datastore
  2. Log into the Virtual Infrastructure - Management Access
  3. Under the Inventory button, ensure Hosts and Clusters is ticked
  4. Highlight the VM you want to attach the ISO image to
  5. Right-click and select Edit Settings...
  6. Highlight the CD/DVD Drive, and select the Datastore ISO file
  7. Hit Browse and go into the appropriate datastore
  8. Select the required ISO file
  9. Tick the Connected check box
  10. Hit OK, the ISO will be attached to the VM's CDROM drive as if you'd inserted a CD into a physical drive
  • Once you've finished using the ISO, go back into the VM's settings and untick the Connected check box
  • To boot a VM to a CDROM ISO, check the "Connected at power on" checkbox and restart the VM's OS

To create an ISO image

You'll need to download an ISO creator, there are many freeware utilities available, however one that's tried and tested is ISORecorder. Generally you can create ISO images from both a physical CD, or just the contents of a folder (if you have ISORecorder installed, right-click over the disk or folder and select "Create ISO image")


Change Network Connection

In similar fashion to being able to swap over a network cable for a physical server, the network connection of a virtual machine can be changed on the fly

  1. Log into the Virtual Infrastructure - Management Access
  2. Under the Inventory button, ensure Hosts and Clusters is ticked
  3. Highlight the VM you want to change the network connection on
  4. Right-click and select Edit Settings...
  5. Hightlight the appropriate Network Adapter, and select the new Network Connection
  6. Change takes effect as soon as OK is hit


Add an Additional Network Connection

When adding additional network connections to any system you must consider network security, for example no system should ever be given access to both Private and Public networks.

  1. Shut down the Application and OS of the virtual machine
  2. Log into the Virtual Infrastructure - Management Access
  3. Under the Inventory button, ensure Hosts and Clusters is ticked
  4. Highlight the VM you want to add the network connection to
  5. Right-click and select Edit Settings...
  6. Hit the Add... button and select Ethernet Adapter, and hit Next
  7. Select the appropriate network connection and hit Next, and then Finish
  8. Power on the virtual machine


Change Physical Memory / CPU's Allocation

  1. Shut down the Application and OS of the virtual machine
  2. Log into the Virtual Infrastructure - Management Access
  3. Under the Inventory button, ensure Hosts and Clusters is ticked
  4. Highlight the VM you want to change the network connection on
  5. Right-click and select Edit Settings...
  6. Hightlight the appropriate setting, Memory or CPUs, and edit as required.
  7. Apply the change by hitting OK
  8. Power on the virtual machine

Config Settings

Disable Shutdown Event Tracker

If the ESX servers are running as a HA cluster then they MUST be able to fully startup automatically after a re-boot. The Windows OS Shutdown tracker asks why you're shutting down or rebooting a system, or following an unexpected shutdown, halts the starting of a system pending information from the user. Not a problem for servers where all applications run as a service, but would impede VMware HA operating effectively where (GUI) applications need to start by stopping systems being restarted fully.

To disable...

  1. Start Group Policy Object Editor (Start | Run | gpedit.msc)
  2. Go to Computer Configuration\Administrative Templates\System
  3. Set Display Shutdown Event Tracker to Disabled

Set Low Risk File Types

If mapped drives are being used, .bat and .exe files need to be declared as low risk file types to stop Open file - Security Warning prompts being displayed when trying to run from mapped drives. This is particularly a problem if software is set to auto-start by placing shortcuts in the StartUp directory, as the software won't auto start.

To disable...

  1. Start Group Policy Object Editor (Start | Run | gpedit.msc)
  2. Go to User Configuration\Administrative Templates\Windows Components\Attachment Manager
  3. Set the "Default risk level for file types" to Enabled
  4. Specify the low extensions as .bat;.exe

Files Information

File Purpose Notes
*.vmx VM config file Contains the full config of the virtual machine
*.vmsd
*.vmxf
*.vmdk Virtual hard-drive file
*.nvram vBIOS file Can be deleted, gets recreated on VM start (BIOS settings will be defaulted)
*.vswp VM memory swap file Can be deleted(?)

Increase Disk Size

Increasing the virtual disk size provided to a VM is straight forward (though be aware that snapshots need to be deleted 1st, if any exist)...

  1. Go into the VM's settings
  2. Increase the size of the disk and apply
  3. Within the VM's OS, rescan the disk, and the new space will be visible

The trick is to extend the logical partition within the OS. Depending on the original partition type and the OS, the options vary.

In-case of problems, see - Can't Increase a VM's Disk

Increase Logical Partition

Generally boot or system disks cannot be extended whilst the OS is up, whereas normal data disk can be in later OS's, but this is still not ideal. Its generally most reliable to plan for system down time, and use a utility to extend the partition whilst its offline. Especially in a virtual environment there is no excuse for not making a backup of the partition 1st.

For Windows 2008 machines this isn't a problem.

For Windows 2003 machines...

Partition Type Options
System Either Cannot be extended
Data Basic Cannot be extended, can convert to Dynamic, but this will require a brief IO interruption.
Data Dynamic Can be extended on the fly, but a new volume is tagged onto the end of the existing partition to create a larger one made up of two volumes

Download a copy of the GParted Live CD - http://gparted.sourceforge.net/livecd.php, this will need to be booted to by the VM

  • Note There is a bug in some recent versions of GParted (v0.5.0-3 and v0.5.1-1 are known to have issues), whereby the boot fails with the following error, v0.4.6-1 is known to work
    • Unable to find a medium containing a live file system
  1. Increase the relevant VMDK size through the VM's options
  2. Start snapshoting (or take a full backup of the machine)
  3. Attach GParted ISO to VM and restart
    • If VM doesn't boot to the ISO, force the VM to boot to BIOS (Options | Advanced | Boot Options in VM Settings) and change the VM's boot order
  4. Boot into GParted Live (accepting the default options, except setting language to English UK)
  5. Once in GParted, follow the interface, and apply changes to action
  6. Restart VM and verify all is good
  7. Turn off snapshotting

VM's With Lots Of Disks

It can be very difficult to identify the correct disk within VMware to increase when a VM has a large number of VMDK's.

  • Disk numbering behalves differently, with Windows starting at Disk 0, and VMware starting a Disk 1
  • SCSI ID's will match, but Windows SCSI bus numbers are normally 0, whereas VMware bus numbers will increment (so VM disk 35 (Win disk 34), could be 2:4 in VMware, but 0:4 within the OS)
  • Disk size can be a useful method of validation (if differing disk sizes are used)
  • Windows drive letters are useless, never assume D: is disk 2 for example

Rename a VM

Renaming a virtual machine just by right-clicking over the machine and renaming does not alter the underlying file and folder names. To ensure that these changes take place you must move the VM to another datastore, ie

  1. Shutdown the VM
  2. Rename the VM in vCenter
  3. Migrate the VM and move it to another Datastore
  4. Restart the VM

If you can't move the VM to another datastore then it gets much more complicated, requiring faffing around in the service console.

  1. Shutdown the VM
  2. vmware-cmd -s unregister /vmfs/volumes/datastore/vm/vmold.vmx
  3. mv /vmfs/volumes/datastore/vm-old /vmfs/volumes/datastore/vm-new
  4. cd /vmfs/volumes/datastore/vm-new
  5. vmkfstools -E vm-old.vmdk vm-new.vmdk
  6. find . -name ‘*.vmx*’ -print -exec sed -e ‘s/vm-old/vm-new/g’ {} \;
  7. For every file that hasn’t been renamed (.vmsd etc.) mv vm-old.vmx vm-new.vmx
  8. vmware-cmd -s register /vmfs/volumes/datastore/vm-new/vm-new.vmx

The above was taxed from http://www.yellow-bricks.com/2008/02/10/howto-rename-a-vm/

Clone a VM

This can done as

  • Hot clone - Source VM is left running, its disks are quiesced, and cloned. Can cause problems as new machine behaves as if it was ungracefully shutdown when first started, but normally successful. Source machine needs to be relatively quiet.
  • Cold clone - Source VM is shutdown 1st, preferable to a warm clone if possible.

Snapshots and Cloning

Snapshots are deleted during a clone, in that cloning a machine that has existing snapshots results in the post-snapshot changes being merged into the new machine.

In order to retain the snaphosts, the virtual machine needs to be cloned manually (untested procedure!!)...

  1. Copy all of the VMs files into a new directory (using vmkfstools --nosparse option).
  2. Correct the .vmx file to match new paths, update VM name, and delete the UUID line (VMware will prompt to generate a new one when the VM is started).
  3. Register the new VM in vCentre and double check the VM is as expected.
  4. Power on (you'll get an IP conflict if its on the same portgroup as the original)

Shutdown VM via Service Console

  • To determine state of an Virtual Machine running from the local ESX
    • vmware-cmd /vmfs/volumes/SAN1/ServerA/ServerA.vmx getstate
    • getstate() = on
  • Shutdown a Virtual Machine running from the local ESX forcefully
    • vmware-cmd /vmfs/volumes/SAN1/ServerA/ServerA.vmx stop hard
    • stop(hard) = 1

Upgrade ESX3 to ESX4

Preparation

  • Clean up the VM
    1. Stop any snapshots, and ensure there's no remnant snapshot files (*.vmsd, *-0000x.vmdk, *-delta.vmdk)
    2. No CD/floppy file attached
  • Clean up the guest OS
    1. Delete unnecessary files
    2. Ensure VM Tools is up to date
    3. Perform a reboot (without any changes)
    4. Check logs to ensure machine started without any significant errors
  • Record IP settings (they will get lost!)
    1. ipconfig/all
    2. route print if there might be static/persistent routes
  • Ensure you know the machines admin account (inc domain if on domain)
  • Shut the VM down

Procedure

Procedure assumes your migrating machines from a VI3 infrastructure to a new VI4/vSphere infrastructure.

  1. Export machine as a Virtual Appliance from VI3 infrastructure
  2. Import machine into new vSphere infrastructure
    • In the VI Client, select the VM and go to File | Deploy OVF Template..., and select the appropriate options in the resulting wizard
  3. Take a snapshot (if you make an irreversible mistake its quicker to revert to snapshot than reimport)
  4. Check the VM's settings, particularly Guest OS (which sometimes gets set to Other)
  5. Start the virtual machine, update VM Tools then shutdown
  6. Upgrade the virtual hardware
    • Right-click and select Upgrade Virtual Hardware
  7. Upgrade the network adapter to VMXNET3
    • Remove existing network adapters (note the networks they're connected to!), then add the same quota of VMXNET3 adapters (connected to the same networks in the same order)
  8. Upgrade the SCSI controller - part 1 (if required)
    • Add a new temporary disk, on the next bus (eg SCSI node 1:x), and change the type to VMware Paravirtual
  9. Restore network config
    • Restart VM, and re-apply recorded network config (answer Yes when asked whether to remove duplicate config on non-existent adapter)
  10. Upgrade the SCSI controller - part 2 (if required)
    • Shutdown the VM, and remove the temporary disk added, and change the original SCSI controller to VMware Paravirtual. Restart the machine.
  11. Delete/Commit the snapshot

Troubleshooting

See also Virtual Centre Troubleshooting

Can't Connect to VM Console

Error connecting: Cannot connect to host...

  • This is caused by a TCP connection failure to the ESX server the VM is hosted on. Using telnet or a port test utility, confirm you can connect on both TCP 902 and 903 from your machine to the ESX server.

Can't Deploy VM

The VirtualCenter server is unable to decrypt passwords stored in the customization specification

  • Bizarrely caused by the Virtual Centre running out of disk space, free up some space and all will be well.

A general system error occurred: Failed to create journal file provider

  • Check ESX disks are not full

Can't Start VM

HA Admission Control

  • Can't start VM as doing so wouldn't leave enough failover capacity in order to be able to restart failed VM's should an ESX fail. Options are to
    • Reduce resource usage of VM's that are already running
    • Increase cluster capacity
    • Reduce the cluster's failover capacity, or allow constraints violations
  • If no VM's have been recently added to the cluster, its likely that the HA agent on one of the ESX's has stopped functioning, in which case, within the cluster, one of the ESX's will have a red warning/exclamation triangle. If so you can restart HA on that ESX;
    1. Highlight this VM, on the Summary tab you should see a notice regarding HA problems
    2. Run the Reconfigure for HA command, this will re-install the HA agent on the ESX

Failed to relocate virtual machine

  • DRS is attempting to relocate a VM at power up, and this relocation failing
    • Reattempt to power on machine
    • Manually migrate to a less loaded ESX and reattempt power on

Access to VMFS storage

  • ESX may have lost connectivity to VMFS partition on which VM resides

VMFS full

  • If VMFS is full, the ESX won't be able to write to the VM's logs when it starts it up, causing VM start-up to fail

ESX licensing

  • Either ESX isn't licensed, or has lost contact with the license server (VI3) for a long period of time

Waiting for question to be answered

  • Generally after changes (such as cold migrations or new deployments), a VM may need to have a question answered before it can continue to power on

Could not power on VM: No swap file. Failed to power on VM

  • The ESX you're starting the VM up on can't get proper access the VM's files, either because
    • The VM is already powered up on another ESX
    • The VM's files have been corrupted / locked

If there have been no ESX failures, then the VM's files are probably corrupted. The VM can be re-registered by removing and re-adding it to the inventory.

If the ESX the virtual is/was on has failed then its possible that only the ESX's network connections have failed, and the virtual machines are still running on the ESX, but are isolated from the network.

  • To cause a full HA failover, pull the power cables out of the ESX to kill it completely
  • Alternatively, attempt to restore network connectivity to allow the VM's to be reachable again

For further info see - VMware KB10051 - Virtual machine does not power on because of missing or locked files

Can't VMotion a VM

VM network doesn't exist at destination

  • VM is using a particular port group which doesn’t exist on the destination ESX

ESX / network too busy

  • VMotion can’t copy across VMs memory contents/changes quickly enough. An alternative is to use a Low Priory VMotion, which is more likely to succeed, but may result in the VM experiencing temporary freezes (avoids full OS downtime, but not without impact to hosted applications)

ESXs can't communicate

  • ESXs need to be able to communicate via VMotion network. DNS problems and FQDN inaccuracies can also cause problems

VM is connect to CD-ROM/ISO

  • VMs CD-ROM is connecting to an ISO file via the host ESX, tying it to that ESX

Can't Increase a VM's Disk

A general system error occurred: Internal error

  • Can be caused by existing snapshots running on a VM
  • Check the ESX logs / available disk space etc

Can't Snapshot

Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine

Can't Commit Snapshot

If snapshot files are large then patience is of the essence, and if possible, shut the VM down 1st, or at the very least limit activity on the VM. To commit a snapshot in a running VM, first a new snapshot is started, then the original redo files are merged with the base disk(s), then the extra redo file is merged.

Operation timed-out

  • Not unusual for large (>10GB) redo files, the process continues and its just vCentre reporting it as a time-out
    • Check the VM's files for any activity (changes in disk sizes/timestamps), speed is dependant on redo size, storage speed, ESX load, VM activity (if possible shut the VM down before removing the snapshot)
    • Also see Snapshot Still Active?

No Snapshots Exist in Snaphot Manager (but still exist)

  • Can happen if a snapshot Delete (All) fails to complete properly (eg ESX pseudo-hangs and you restart the management agents)
    1. Backup and then delete the VM's VMSD file
    2. Start a new snapshot
    3. In snapshot manager use Delete All (not Delete!)
  • If this fails, check the ESX log to see what went wrong

Snapshot Still Active?

  1. Check Snapshot Manager, if there's snapshots listed then there are still active snapshots
  2. Open up Datastore Browser to the VM's folder, and see if any snapshot files exist, if not then there are no active snapshots
  3. Check the VM's VMX file, the VMDK filename(s) will be either a snapshot or normal flat base disk file
    • EG scsi0:0.fileName = "MyVM-000001.vmdk" ←←←←← Snapshot file (snapshot running)
    • EG scsi0:0.fileName = "MyVM-000001-delta.vmdk" ← Snapshot file (snapshot running)
    • EG scsi0:0.fileName = "MyVM.vmdk" ←←←←←←←←← Base disk file (no snapshot running)
    • EG scsi0:0.fileName = "MyVM-flat.vmdk" ←←←←←← Base disk file (no snapshot running)
  4. If there's no snapshots running, but snapshot files exist then the files can be deleted (if you're sure!)

Can't Customise

Windows setup could not configure Windows to run on this computer's hardware
Windows could not complete the installation. To install Windows on this computer, restart the installation.

  • The guest customisation is failing because either
    • The virtual hardware has changed (especially disk type) since the original machine was created
    • Sysprep can't customise the machine because it doesn't have administrator rights, this can occur where a DC's users have been offloaded to LDS