ESX 3.5 update 4 issue -reinstall GRUB

Use a Live CD to boot your ESX host and fix the host from a chroot environment.

Boot from the Rescue CD.

Check device names for / and /boot filesystems.

For Example, Internal RAID /boot  can be /dev/cciss/c0d0p1  and / can be /dev/cciss/c0d0p7

Run the following command to mount the / filesystem and chroot to it:

mkdir /mnt/root

mount /dev/cciss/c0d0p7 /mnt/root

chroot /mnt/root

Run the following command to mount /boot filesystem to /boot mountpoint:

mount /dev/cciss/c0d0p1 /boot

Ensure the /boot  contains the kernel, initrd, grub/ subdir with stage* files, grub.conf and menu.lst , which is a symlink to grub.conf.

You need to replace anything from step 5 that is missing.

Run the following command if any of the stage files are missing:

cp /usr/share/grub/i386-redhat/* /boot/grub/

You can copy all the files from /usr/share/grub/i386-redhat/ to /boot/grub/ .

If grub.conf is missing, you have to create a new one or take a copy from another server.

An example of /boot/grub/grub.conf is:

vmware:configversion 1

# grub.conf generated by anaconda

#

# Note that you do not have to rerun grub after making changes to this file

# NOTICE: You have a /boot partition. This means that

# all kernel and initrd paths are relative to /boot/, eg.

# root (hd0,0)

# kernel /vmlinuz-version ro root=/dev/sdc2

# initrd /initrd-version.img

#boot=/dev/sdc

timeout=10

default=0

title VMware ESX Server

#vmware:autogenerated esx

root (hd0,0)

uppermem 277504

kernel –no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=/dev/cciss/c0d0p7 mem=272M

initrd /initrd-2.4.21-47.0.1.ELvmnix.img

title VMware ESX Server (debug mode)

#vmware:autogenerated esx

root (hd0,0)

uppermem 277504

kernel –no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=/dev/cciss/c0d0p7 mem=272M console=ttyS0,115200 console=tty0 debug

initrd /initrd-2.4.21-47.0.1.ELvmnix.img-dbg

title Service Console only (troubleshooting mode)

#vmware:autogenerated esx

root (hd0,0)

uppermem 277504

kernel –no-mem-option /vmlinuz-2.4.21-47.0.1.ELvmnix ro root=/dev/cciss/c0d0p7 mem=272M tblsht

initrd /initrd-2.4.21-47.0.1.ELvmnix.img-sc

If the server has multiple drives, LUNs, etc., it may be useful to create/edit a /boot/grub/device.map file, with the following content:

(hd0) /dev/cciss/c0d0p1

Where the device name in /dev/ is the boot partition device. Usage of the device.map file significantly speeds up the process, as the GRUB does not have to autodetect devices.

Run the /sbin/grub command if you are using device map file:

/sbin/grub –device-map=/boot/grub/device.map

Run the following command in the GRUB shell:

root (hd0,0)

Run the following command in the GRUB shell:

setup –stage2=stage2 –prefix=/grub (hd0)

Note: This is for setup, where /boot is (hd0) . If this did not work, you can try:

setup (hd0)

Run the quit command to exit the GRUB shell.

Run the following command:

sync

Reboot the server and remove the Rescue CD

command line option to Power off a VM -ESX host

  1. Right-click on the virtual machine and choose Power off using the VMware Infrastructure Client.
  2. If this does not work, you must use the command line method.
  3. From the Service Console of the ESX host, run these commands:

    vmware-cmd <cfg> stop
    vmware-cmd <cfg> stop hard

    Where <cfg> is the complete path to the configuration file, which can be determined by running:

    vmware-cmd –l

  4. Run the following command to check the state of the virtual machine:

    vmware-cmd <cfg> getstate

  5. If none of the above suggestions for stopping the virtual machine work, get the virtual machine’s process ID using the following command:

    ps –auxwww | grep –i <vm name>

  6. Kill the process ID (PID) for the virtual machine (number in the second column of the previous step) using the following command:

    kill PID

  7. After issuing the kill command, wait 30 seconds and run the following command to check for the process:

    ps –auxwww | grep –i <vm name>

  8. If the process is still present, run following command to stop the process:kill -9 PID

  9. Wait 30 seconds and check for the process again.

Alternate kill method

  1. Run the following command to determine VMID of the problem virtual machine:vm-support -x or cat /proc/vmware/vm/*/names

  2. Run the following command to determine the master world ID for the virtual machine using the VMID determined from the previous step for ####:

    less –S /proc/vmware/vm/####/cpu/status

  3. Find group number by scrolling over to Group and finding underneath vm.####.
  4. Run the following commands to kill the virtual machine using the group ID determined from the previous step:

    /usr/lib/vmware/bin/vmkload_app –k 9 ####