Header image alt text

Kevin's Thoughts!

Maybe you agree, maybe you don't… find out!

Best Practices for KVM

Posted by Kevin on August 29, 2011
Posted in Tech Talk  | Tagged With: , , , | No Comments yet, please leave one

IBM is, and has been, a great supporter for KVM.  I was quite pleased today to find the following link:

http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaatbestpractices_pdf.pdf

Nice, easy to read, publication on how to optimize the performance of KVM, complete with a few test results to justify their recommendations.

Thank you IBM!

An update:

Being a bit of a computer efficiency enthusiast (read “speed freak”) I decided to migrate one of my VMs from a .img file base directly to an lvm – bypassing the Linux filesystem overhead as IBM suggests.  The process was relatively straight forward, given that I had no issues with the VM being down when I migrated it.  I decided to do this for my new Mail VM, since I was going to have to grow the image size anyhow to handle long term storage of e-mail.

1) Create a new logical disk volume:  lvcreate -L512G -nmail_vm vg

2) Boot my “recovery” minimal boot image passing it the old image file and the new logical volume

kvm  -drive file=mail.img,if=virtio -drive file=/dev/vg/mail_vm,if=virtio,format=raw -cdrom minimal.iso -boot d -curses -no-reboot -m 512 -smp 2

The VM then boots with 2 drives, /dev/vda (old mail.img) and /dev/vdb (new disk).  Within the minimal shell, you then:

A)  Format the new drive with fdisk /dev/vdb (I create 4 partitions, a 64mb /boot in partition   1, a 1GB swap area in partion 2, at 512mb /root in 3, and an LVM area with the rest of the disk in partition 4)

B)  Create a new physical volume group on it via pvcreate /dev/vdb4, a logical volume group on that with vgcreate vg00 /dev/vdb4  and divvy it up as desired with lvcreate commands.  Do note that the volume group name you use needs to be different than what is used on the .img drive.  Confirm everything looks good with vgs and lvs.

C)  Mount the old and new partitions onto new root trees like /or and /nr and use tar to move the files cd or; tar cf – . | (cd /nr; tar xvf -).

D)  Install grub on the new drive to make it bootable.  This was actually just a bit tricky, the key being to create symbolic links for /dev/hda, /dev/hda1, etc. to their /dev/vda counterparts after which the grub root (hd0,0) and setup (hd0) command worked as normal.

E)  Shutdown and start using your new /dev/vg/mail_vm image:

kvm -net nic,model=virtio -net tap,ifname=tap2,script=no  -drive file=/dev/vg/mail_vm,if=virtio,boot=on -curses -no-reboot -m 2g -smp 2 -runas kvm -name mail,process=mail-kvm

 

Share

KVM Paravirtualization

Posted by Kevin on August 28, 2011
Posted in Tech Talk  | Tagged With: , , , | No Comments yet, please leave one

KVM Paravirtualization

In previous post I talked building my KVM machine. This was semi straight forward, turn on visualization support in the kernel , build a image (see my KVM Virtualization post) and boot.  Pretty easy to get a user-based networked server up and running.  In its simplest form, the command:

kvm -hda base-vm.img -curses -no-reboot

Starts a virtual machine whose image is stored in the .img file, with a text based console, and will return to the command prompt when the virtual machine is shutdown or rebooted.  The virtual machine will see the image file as its /dev/sda disk.  An eth0 device will be presented (presuming the image is built with a supported driver, like rtl8139) along with dhcp and name services being provided by the kvm hypervisor (which is what the kvm command really is).  If you are logged into this machine, there would be little to give you any hint that you were running in a VM rather than on dedicated hardware.

That’s all good, but its somewhat inefficient.  Linux and KVM support a number of paravirtualization features which reduce overhead within the guest support – if the guest enables such support.  My guest have been configured to use as much of this support as possible, giving up most chances of running native, but benefiting from a reduced kernel size and more efficient execution.

There are several kernel configuration switches that need to be set to fully utilize this support:

Under “Paravirtualized guest support” select:
KVM paravirtualized clock (CONFIG_KVM_CLOCK=Y)
KVM Guest Support (CONFIG_KVM_GUEST=Y)
Enable Paravirtualization code (CONFIG_PARAVIRT=Y)
Under “Device Drivers”, “Block Devices”, select Virtio block driver (CONFIG_VIRTIO_BLK=Y)
Under “Device Drivers”, “Network Device Support”, select Virtio network driver (CONFIG_VIRTIO_NET=Y)

So far, nothing you have done will stop your kernel from booting on real hardware, although it will have some extra overhead with the Enable Paravirtualization code feature on.  However, the next step commits you to running as a guest only:

You can remove all SCSI disk, Serial ATA, and Parallel ATA support as well as all network device support (I leave Network Console logging support enabled by habit, although I’ve yet to use it).  Presuming you use LVM, you will need to leave Device Mapper Support enabled.

Since we have eliminated all ATA and SCSI devices, obviously /dev/sda will no longer exist.  It has been replaced by /dev/vda (virtual disk #1).  You will need to change your /etc/fstab file to reference /dev/vda instead of /dev/sda, and change grub’s menu.lst and device.map file via your favorite editor.  If you forget this step, just boot the minimal.iso – it supports virtio devices – and fix these entries.

Now change your startup command to look like:

kvm -drive file=base-vm.img,if=virtio,boot=on -net nic,model=virtio -curses -no-reboot

Notice when you boot that there are no references to “eth0″, but the device will be there, now 100% virtual without any emulation overhead.  Likewise for storage, everything will be presented without the overhead of emulating SCSI or ATA.  Your kernel will also take advantage of many paravirtualization improvements now that you gave it the opportunity to.

Happy efficient computing!

 

Share

KVM Bridging

Posted by Kevin on August 28, 2011
Posted in Tech Talk  | Tagged With: , , , , , | No Comments yet, please leave one

KVM Bridging

Spent more than a little time banging my head against the wall and freaking out that Google couldn’t help.  What was the problem?  I couldn’t get virtual network bridging to work between my KVM based guest and the rest of the world.

If I started my a KVM with -net user (or let it default), all was fine.  By default, KVM will create a private network and provide dhcp and name services to the virtual instance.  You can either use a dhcpd client or just hardcode ip addresses in the defined 10.0.2.x space.  However, this “user” mode is designed as its named – it allows the VM to get out to the Internet, but doesn’t allow anything in.  So it works great for building your image, and is even fine for running things like BOINC, but is rather limited.  For instance, you can not SSH into the machine, so it better either run things automagically, or you will need to start it in a window.

For reasons unknown, if I started a KVM using a tun/tap (tap in this case) device it behaved oddly.  Once in awhile, I’d see an eth0 device appear during boot (seen via reviewing the boot log using “dmesg”).  As I tried with various “model=” options, I would see the corresponding driver load and sometimes a reference to eth0, but seldom anything that “ifconfig” could see.

Per most of the documentation I found online, the following startup command should have worked:

kvm -net nic,macaddr=00:00:00:00:01:00,model=virtio -net tap,ifname=tap0,script=no -drive file=test.img,if=virtio,boot=on -curses -no-reboot

And in fact, on those odd occasions when some variation of the above actually generated an eth0 device, it would appear with the macaddr specified.  Still, in those cases, I could not ping or web (via links) outside to the world.  (fyi:  ping using ICMP, webbing using TCP, so it was good to check both given KVMs blocking of ICMP when using -net user).   It was also freaky inconsistent.  At one point I had two KVMs with eth0′s showing up on one but not the other.  I did a side-by-side kernel build comparison and the kernels were identical.  I did several reboots of each KVM, several kernal builds including a “make clean” pass, and it would work on one but not the other.  Even when it did work, at the host level I often saw errors about the bridge receiving packets with its own address.  I thought those were referring to TCP/IP addresses (and went down a path trying to figure that out), but in hindsight it was referring to MAC addresses!

Finally I decided to ignore the web help, and just start at the beginning using the man pages.  Strange how that sometimes helps <smile>.

The breakthrough occured when I used the ever so slightly simplier command:

kvm -net nic,model=virtio -net tap,ifname=tap0,script=no -drive file=test.img,if=virtio,boot=on -curses -no-reboot

Note, the only difference is the lack of specifying the mac address.

Here are the relevant bits from /etc/conf.d/net:

tuntap_tap0=”tap”
config_tap0=”null”
tunctl_tap0=”-u root”
mac_tap0=”00:00:00:00:01:00″
bridge_br0=”eth1 tap0 tap1 tap2 tap3 tap4 tap5 tap6 tap7″
config_br0=”10.184.155.29/24″
routes_br0=”default gw 10.184.155.1″

A couple of notes:  Using a mac address of all zeros generates a boot time error about not being able to reassign the hardware address the mac address represents.  Any non-all-zero address works fine.  Also, the bridge will take on the mac address of the first tap device defined.  Also, this is a test environment, so I’ve given myself a lot of tap devices to play with.  So, at the host level, we see:

ifconfig br0
br0       Link encap:Ethernet  HWaddr 00:00:00:00:01:00
          inet addr:10.184.155.29  Bcast:10.184.155.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:75628 errors:0 dropped:4930 overruns:0 frame:0
          TX packets:22841 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:54238310 (51.7 MiB)  TX bytes:8638206 (8.2 MiB)

ifconfig tap0
tap0      Link encap:Ethernet  HWaddr 00:00:00:00:01:00
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:28733 errors:0 dropped:0 overruns:0 frame:0
          TX packets:53601 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:35590127 (33.9 MiB)  TX bytes:5638527 (5.3 MiB)

And within the KVM we see:

ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 52:54:00:12:34:56
          inet addr:10.184.155.30  Bcast:10.184.155.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:44833 errors:0 dropped:0 overruns:0 frame:0
          TX packets:28551 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3881750 (3.7 MiB)  TX bytes:35412015 (33.7 MiB)

Much to my delight – this worked!  KVM apparently uses the tap device, but assigns it a unique MAC address (the HWaddr shown in the ifconfig output).  Now all generated IP packets have different MAC source addresses from the bridge and everything works as expected!  My guest KVM can ping, web, and do anything else it wishes with the outside world (at least until I start configuring iptables, but that is another post).  One glitch remained, the outside world didn’t know how to get the virtual machine.  A quick route add -host 10.184.155.30 gw 10.184.155.29 allowed me to ssh into the vm.  Still couldn’t get to it from a windows machine though.  Emerging and starting mrouted seem to have fixed that.  Not 100% sure that was needed, but its working, and at this point I’m going with “if its not broke, don’t fix it” as a philosophy.

Share
KVM Virtualization

With VMWare recently announcing a new pricing structure based on total virtual memory size instead of server size, I decided that perhaps another look at KVM was appropriate.  KVM is open-source and built into the Linux kernel (presuming you enable it).

Like most projects this one has spun off several diversions.  But first, perhaps a bit of background:

I love the Gentoo distribution.  I like its simplicity, cleanness, and the fact that’s its one step off the beaten path so less likely to be a hack target.  All my Gentoo based systems are command line driven, no X-windows or the like.  This keeps them very clean with a fully functional base system typically having about 250 packages installed.  My test system, which I’m trying to keep squeaky clean, is at 173 packages with everything needed to run a guest KVM machine, VI, mdadm, lvm, libvirt, links, grub, etc.

Oh, I really needed a separate system for this so have breadboarded together an AMD Phenom II X6 (6-core) based system with 8GB of memory and (6) 500GB disk.  Suffice it to say that I REALLY like Raid-6ing my disk and 500GB, although not the smallest disk available these days, is close and cost just a few dollars more than the 320GB low-cost SATA drives I could find.

I simple did a minimal install following the instructions at the Gentoo site below.  Yes, I’ve been doing Gentoo for, oh goodness, perhaps a decade now, and yes, I still follow the instructions.  Its just easier and you tend not to forget things, like setting the root password!

http://www.gentoo.org/doc/en/gentoo-x86+raid+lvm2-quickinstall.xml

An hour or so later all was functional, and I did a couple of full “emerge -e world” just to make sure all everything was clean and compiled specifically for my processor.  Oh, in case you were not aware, Gentoo is a source based distribution.  You can start with pre-compiled things if you like (and most do – its called a stage 3 install) and then recompile like I did, or slug your way through bootstrapping up a stage 1 install and be clean from the start.  I recommend the stage 3 install personally, its just not worth the extra effort to do a stage 1, but you can if you want to.

Learned one new trick doing this:  Rather than using a -march=i686 (for a very generic build that will run on virtually any Intel like system) or -march=athlon64 (for a system like mine), just use -march=native and GCC will figure it out, building things to use every bell and whistle that your processor offers.  Just don’t try and port the system to a different CPU type.  Yes, this will bite you if you decide later to change your motherboard to the “other side”, but you system will run optimally until then.

The first side project was to explore LVM (logical volume manager – a way to manage your disk space rather virtually) a bit.  I understood it in concept, the HP systems we bought decades ago used it, but I hadn’t personally played with it much.  The install link highlighted above gave me the basic commands I needed, but chickened out and only used LVM for the non-root partitions.  Turns out this is easy, you don’t even need to use an initramfs if you don’t want to (just add lvm to you boot runlevel).

Alas, if I was going to use LVM, I wanted to use it for everything (shy the /boot partition, which simply must be a real partition or at most a Raid 1 one – I’m using a 6 disk Raid 1.  Yes, overkill, but what else should I use that space for on each drive?).  Using it for the root partition mean I need to load the virtual device files for the volume groups prior to trying to mount root.  Cute trick?  Not really, things like that are why Linux supports initramfs systems.  They are RAM based file systems that you run an initial “init” process from (thus its name!).  Kernel support is trivial, just enable “initial RAM filesystem and RAM disk (initramfs/initrd) support” in your kernal configuration file.  The filesystem itself is just a gzipped cpio archive of a very simple root filesystem.  Mine contains statically linked copies of busybox, nano, ld-linux-x86-64.so and libc.so (for nano), a mdadm.conf file, statically linked copies of lvm, mdadm, and symbolic links for vgscan and vgchange to that lvm binary.  Oh, and a “init” file which is just the script to execute.  The init script is almost trivial, it mounts proc and devtmpfs, issues a mdadm assemble command to build the raid 6 devices and start them, displays /proc/mdstat for 5 seconds just as a “Look!  At least at this point everything looks good” comfort, does the vgscan and vgchange to enable all the lvm devices, mounts the lvm root partition read-only, unmounts proc and dev, then invokes busybox’s “switch_root” function to clean everything up, chroot to the real root system, and turn things over to init there.

Took a bit to figure that out, but as always, Google was my friend and my lvm based root system was functional!

Next was to build a KVM machines and get it to boot.  Again, Google and Gentoo documentation came to my aid.  I followed this link to get started:

http://en.gentoo-wiki.com/wiki/KVM

Now one minor complication arose.  It seems the world loves to run graphics interfaces.  I like command line!  I can use Putty to access my systems from anywhere, and I don’t have to installed X-Windows, Gnome or KDE, and the hundreds of packages necessary to get them all up and running.  Took awhile, but like most things, its easy once you know how.  The key was simply to use the “-curses” flag when starting up the virtual machine.  I keep a “README” file in my KVM directory for reference.  I’ve cut and pasted that for your quick reference:

Base install created via:

qemu-img create -f qcow2 -o preallocation=metadata base-vm.img 20G

To boot from an ISO image:

kvm -hda base-vm.img -cdrom minimal.iso -boot d -curses -no-reboot

This tells KVM to use the .img file as a disk drive, use the .iso as the cdrom, boot
from the cdrom (“a” = Floppy, “c” = harddrive, “d” = cdrom, “n” = network)
and to use curses/ncurses for display.  That last is the key to being able to run KVM
in a text window.  The “-no-reboot” at the end causes the VM to exit back to the command
prompt when a reboot is attempted.  Handy when you want to change startup parameters.

Note this starts the system with only 128MB of memory.  Enough to boot.

Other good switches:

-m 512m  /* or whatever size you want */
-smp n /* sets the number of CPUs to n */
-daemonize
-echr chr  /* set terminal escape character instead of ctrl-a */
-runas user  /* nice for sorting  processes at hypervisor level */
-net user  /* auto sets up a firewalled virtual network */
-net tap,ifname=tap0,script=no,downscript=no
-net nic,macaddr=00:00:00:00:00:01 (edit: see my next post on KVM Bridging, this wasn’t needed)
-name string1[,process=string2] string1 sets the window title and string2 the process name

Using this with the small gentoo minimal install ISO image (144mb available from http://www.gentoo.org in the download section) brought me up a 24×80 text based window with the normal install boot running!  From that point I just did a normal install, formatting the virtual 20gb file created with qemu-img as if it was a 20gb disk drive, creating an lvm for /usr, /var, and /home, and doing the rest of the install in much the same way as the original machine install.  Note I did not LVM root in the virtual machine, I wanted to keep them as clean and simple as possible.  I gave /boot (/dev/sda1) 50mb of space, created a 1gb swap partition (/dev/sda2), a 256mb root (/dev/sda3), and allocated the rest to /dev/sda4 which I let LVM manage.  LVM in my base image has a 4GB /usr, a 8GB /var, and a 1GB /home.  /usr ended up being a bit tight, but that was mostly solved by nfs mounting /usr/portage/distfiles up from the host.  Makes sense to only have one copy of those files lying around anyhow – I’ll probably export them to my other Linux boxes one of these days.

One note that wasn’t real obvious.  KVM defaults to -net user.  This creates a firewalled user environment that can access the outside world using TCP and UDP, but can’t be accessed from it.  Think of it like a PC connected behind a typical router/firewall box.  Technically, it creates an IP address range of 10.0.2.0/24 and provides a DHCP service to assign an IP address and route to the outside world.   From the minimal image boot, just do a “net-setup eth0″, select “wired”, and then “dhcp” and you will be alive network wise.  Note it does NOT support ICMP, so pings won’t work, but its easy to verify it working – just do a “links www.google.com” and make sure it responds (links is a text mode browser – a weird concept, but it works and is included in the minimal build).

Ah, finally!  I had a functional KVM based virtual machine.  Now, like all my computers, I like to have BOINC running (http://boinc.berkeley.edu).  All my spare CPU cycles go to a couple of projects under BOINC – mostly SETI@home and World Community Grid, but also Virtual Prairie whenever that project has work to be done.  So what could be better than a KVM running Boinc?  Sounded perfect to me.  All I did was:  cp base_vm.img bonic.img, issue the KVM command:

kvm -hda boinc.img -boot d -curses -no-reboot -m 6g -smp 6

and I had a clone of that original base install up and running.  I changed the IP address (which I had hardcoded in the base image) so it wouldn’t conflict, changed the root password, emerged boinc, attached to my projects, and watched everything run!  Took maybe 20 minutes, perhaps less, to create a fully operational KVM customized to the app I wanted to run.

Playing around a bit this morning, I now invoke it with:

nice -n 19 kvm -hda boinc.img -curses -no-reboot -m 6g -smp 6 -runas boinc_kvm -name boinc,process=kvm

This runs the KVM at the lowest priority, so I can do anything else I want without impact, gives it 75% of the memory if it wants to use it, runs on all the idle CPUs, runs it as user “boinc_kvm” so that I can easily track it usage, and called the long “qemu-kvm” command simply “kvm”.  Top shows it as:

PID USER            PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19721 boinc_kv  39  19 6304m 1.1g 2596 S  599 14.0  10:26.20 kvm

Oh, one thing I noticed while building the KVM image and is shown above:  Even though we started the KVM with 6GB of memory (shown under the VIRT label), its only using 1.1G as unique memory for the process.  Thus, like with all VM environments, I should be able to “over subscribe” the memory and get away with it.

That pretty much brings this entry to a close.  I do have several more things I want to work on (and have been as those nasty side projects):

LibVirt looks like a cool interface for managing my virtual machines.  Took me a frustrating forever to figure out how to get it to behave.  One step was obvious, after emerging the code itself:  Start libvirtd in /etc/init.d.  The second was not:  an undocumented USE flag needed to be set when building libvirtd:  USE=qemu.  I added to it my /etc/make.conf USE variable and once libvirtd was rebuilt, it FINALLY started to work.  Prior to that I kept getting “unable to connect to hypervisor” errors.  There is also a /etc/libvirt/libvirtd.conf file that needs very minor and well internally commented tweaking.

The “-net user” works fine for my boinc KVM, but would not suffice for say an Apache KVM since the outside world would need access to it.  For that, a virtually bridged Tun/Tap environment will be needed.  I’ve played with that a bit (remember those nasty side projects?  Tun/Tap was one of them) but don’t quite have it, or do, and just need to test with something other than Ping?  Not sure.

I’d also like to play around with resizing the kvm image file.  Suspect that will be the easy one.

One final note for fellow Putty users (http://www.putty.org):  I find it handy to run putty in a slightly stretched window (add an inch of screen space top and to the sides).  Linux adapts to this and just gives you more screen space, but more importantly curses does NOT.  So if I duplicate my larger than default screen to create a window to run KVM in, and start as described above, the KVM machine console will appear with a blank border around it.  Sounds trivial, but really helps when your bouncing between windows to give you a visual reminder of what level your running at.

Cheers!

 

 

Share