With VMWare recently announcing a new pricing structure based on total virtual memory size instead of server size, I decided that perhaps another look at KVM was appropriate. KVM is open-source and built into the Linux kernel (presuming you enable it).
Like most projects this one has spun off several diversions. But first, perhaps a bit of background:
I love the Gentoo distribution. I like its simplicity, cleanness, and the fact that’s its one step off the beaten path so less likely to be a hack target. All my Gentoo based systems are command line driven, no X-windows or the like. This keeps them very clean with a fully functional base system typically having about 250 packages installed. My test system, which I’m trying to keep squeaky clean, is at 173 packages with everything needed to run a guest KVM machine, VI, mdadm, lvm, libvirt, links, grub, etc.
Oh, I really needed a separate system for this so have breadboarded together an AMD Phenom II X6 (6-core) based system with 8GB of memory and (6) 500GB disk. Suffice it to say that I REALLY like Raid-6ing my disk and 500GB, although not the smallest disk available these days, is close and cost just a few dollars more than the 320GB low-cost SATA drives I could find.
I simple did a minimal install following the instructions at the Gentoo site below. Yes, I’ve been doing Gentoo for, oh goodness, perhaps a decade now, and yes, I still follow the instructions. Its just easier and you tend not to forget things, like setting the root password!
http://www.gentoo.org/doc/en/gentoo-x86+raid+lvm2-quickinstall.xml
An hour or so later all was functional, and I did a couple of full “emerge -e world” just to make sure all everything was clean and compiled specifically for my processor. Oh, in case you were not aware, Gentoo is a source based distribution. You can start with pre-compiled things if you like (and most do – its called a stage 3 install) and then recompile like I did, or slug your way through bootstrapping up a stage 1 install and be clean from the start. I recommend the stage 3 install personally, its just not worth the extra effort to do a stage 1, but you can if you want to.
Learned one new trick doing this: Rather than using a -march=i686 (for a very generic build that will run on virtually any Intel like system) or -march=athlon64 (for a system like mine), just use -march=native and GCC will figure it out, building things to use every bell and whistle that your processor offers. Just don’t try and port the system to a different CPU type. Yes, this will bite you if you decide later to change your motherboard to the “other side”, but you system will run optimally until then.
The first side project was to explore LVM (logical volume manager – a way to manage your disk space rather virtually) a bit. I understood it in concept, the HP systems we bought decades ago used it, but I hadn’t personally played with it much. The install link highlighted above gave me the basic commands I needed, but chickened out and only used LVM for the non-root partitions. Turns out this is easy, you don’t even need to use an initramfs if you don’t want to (just add lvm to you boot runlevel).
Alas, if I was going to use LVM, I wanted to use it for everything (shy the /boot partition, which simply must be a real partition or at most a Raid 1 one – I’m using a 6 disk Raid 1. Yes, overkill, but what else should I use that space for on each drive?). Using it for the root partition mean I need to load the virtual device files for the volume groups prior to trying to mount root. Cute trick? Not really, things like that are why Linux supports initramfs systems. They are RAM based file systems that you run an initial “init” process from (thus its name!). Kernel support is trivial, just enable “initial RAM filesystem and RAM disk (initramfs/initrd) support” in your kernal configuration file. The filesystem itself is just a gzipped cpio archive of a very simple root filesystem. Mine contains statically linked copies of busybox, nano, ld-linux-x86-64.so and libc.so (for nano), a mdadm.conf file, statically linked copies of lvm, mdadm, and symbolic links for vgscan and vgchange to that lvm binary. Oh, and a “init” file which is just the script to execute. The init script is almost trivial, it mounts proc and devtmpfs, issues a mdadm assemble command to build the raid 6 devices and start them, displays /proc/mdstat for 5 seconds just as a “Look! At least at this point everything looks good” comfort, does the vgscan and vgchange to enable all the lvm devices, mounts the lvm root partition read-only, unmounts proc and dev, then invokes busybox’s “switch_root” function to clean everything up, chroot to the real root system, and turn things over to init there.
Took a bit to figure that out, but as always, Google was my friend and my lvm based root system was functional!
Next was to build a KVM machines and get it to boot. Again, Google and Gentoo documentation came to my aid. I followed this link to get started:
http://en.gentoo-wiki.com/wiki/KVM
Now one minor complication arose. It seems the world loves to run graphics interfaces. I like command line! I can use Putty to access my systems from anywhere, and I don’t have to installed X-Windows, Gnome or KDE, and the hundreds of packages necessary to get them all up and running. Took awhile, but like most things, its easy once you know how. The key was simply to use the “-curses” flag when starting up the virtual machine. I keep a “README” file in my KVM directory for reference. I’ve cut and pasted that for your quick reference:
Base install created via:
qemu-img create -f qcow2 -o preallocation=metadata base-vm.img 20G
To boot from an ISO image:
kvm -hda base-vm.img -cdrom minimal.iso -boot d -curses -no-reboot
This tells KVM to use the .img file as a disk drive, use the .iso as the cdrom, boot
from the cdrom (“a” = Floppy, “c” = harddrive, “d” = cdrom, “n” = network)
and to use curses/ncurses for display. That last is the key to being able to run KVM
in a text window. The “-no-reboot” at the end causes the VM to exit back to the command
prompt when a reboot is attempted. Handy when you want to change startup parameters.
Note this starts the system with only 128MB of memory. Enough to boot.
Other good switches:
-m 512m /* or whatever size you want */
-smp n /* sets the number of CPUs to n */
-daemonize
-echr chr /* set terminal escape character instead of ctrl-a */
-runas user /* nice for sorting processes at hypervisor level */
-net user /* auto sets up a firewalled virtual network */
-net tap,ifname=tap0,script=no,downscript=no
-net nic,macaddr=00:00:00:00:00:01 (edit: see my next post on KVM Bridging, this wasn’t needed)
-name string1[,process=string2] string1 sets the window title and string2 the process name
Using this with the small gentoo minimal install ISO image (144mb available from http://www.gentoo.org in the download section) brought me up a 24×80 text based window with the normal install boot running! From that point I just did a normal install, formatting the virtual 20gb file created with qemu-img as if it was a 20gb disk drive, creating an lvm for /usr, /var, and /home, and doing the rest of the install in much the same way as the original machine install. Note I did not LVM root in the virtual machine, I wanted to keep them as clean and simple as possible. I gave /boot (/dev/sda1) 50mb of space, created a 1gb swap partition (/dev/sda2), a 256mb root (/dev/sda3), and allocated the rest to /dev/sda4 which I let LVM manage. LVM in my base image has a 4GB /usr, a 8GB /var, and a 1GB /home. /usr ended up being a bit tight, but that was mostly solved by nfs mounting /usr/portage/distfiles up from the host. Makes sense to only have one copy of those files lying around anyhow – I’ll probably export them to my other Linux boxes one of these days.
One note that wasn’t real obvious. KVM defaults to -net user. This creates a firewalled user environment that can access the outside world using TCP and UDP, but can’t be accessed from it. Think of it like a PC connected behind a typical router/firewall box. Technically, it creates an IP address range of 10.0.2.0/24 and provides a DHCP service to assign an IP address and route to the outside world. From the minimal image boot, just do a “net-setup eth0”, select “wired”, and then “dhcp” and you will be alive network wise. Note it does NOT support ICMP, so pings won’t work, but its easy to verify it working – just do a “links www.google.com” and make sure it responds (links is a text mode browser – a weird concept, but it works and is included in the minimal build).
Ah, finally! I had a functional KVM based virtual machine. Now, like all my computers, I like to have BOINC running (http://boinc.berkeley.edu). All my spare CPU cycles go to a couple of projects under BOINC – mostly SETI@home and World Community Grid, but also Virtual Prairie whenever that project has work to be done. So what could be better than a KVM running Boinc? Sounded perfect to me. All I did was: cp base_vm.img bonic.img, issue the KVM command:
kvm -hda boinc.img -boot d -curses -no-reboot -m 6g -smp 6
and I had a clone of that original base install up and running. I changed the IP address (which I had hardcoded in the base image) so it wouldn’t conflict, changed the root password, emerged boinc, attached to my projects, and watched everything run! Took maybe 20 minutes, perhaps less, to create a fully operational KVM customized to the app I wanted to run.
Playing around a bit this morning, I now invoke it with:
nice -n 19 kvm -hda boinc.img -curses -no-reboot -m 6g -smp 6 -runas boinc_kvm -name boinc,process=kvm
This runs the KVM at the lowest priority, so I can do anything else I want without impact, gives it 75% of the memory if it wants to use it, runs on all the idle CPUs, runs it as user “boinc_kvm” so that I can easily track it usage, and called the long “qemu-kvm” command simply “kvm”. Top shows it as:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19721 boinc_kv 39 19 6304m 1.1g 2596 S 599 14.0 10:26.20 kvm
Oh, one thing I noticed while building the KVM image and is shown above: Even though we started the KVM with 6GB of memory (shown under the VIRT label), its only using 1.1G as unique memory for the process. Thus, like with all VM environments, I should be able to “over subscribe” the memory and get away with it.
That pretty much brings this entry to a close. I do have several more things I want to work on (and have been as those nasty side projects):
LibVirt looks like a cool interface for managing my virtual machines. Took me a frustrating forever to figure out how to get it to behave. One step was obvious, after emerging the code itself: Start libvirtd in /etc/init.d. The second was not: an undocumented USE flag needed to be set when building libvirtd: USE=qemu. I added to it my /etc/make.conf USE variable and once libvirtd was rebuilt, it FINALLY started to work. Prior to that I kept getting “unable to connect to hypervisor” errors. There is also a /etc/libvirt/libvirtd.conf file that needs very minor and well internally commented tweaking.
The “-net user” works fine for my boinc KVM, but would not suffice for say an Apache KVM since the outside world would need access to it. For that, a virtually bridged Tun/Tap environment will be needed. I’ve played with that a bit (remember those nasty side projects? Tun/Tap was one of them) but don’t quite have it, or do, and just need to test with something other than Ping? Not sure.
I’d also like to play around with resizing the kvm image file. Suspect that will be the easy one.
One final note for fellow Putty users (http://www.putty.org): I find it handy to run putty in a slightly stretched window (add an inch of screen space top and to the sides). Linux adapts to this and just gives you more screen space, but more importantly curses does NOT. So if I duplicate my larger than default screen to create a window to run KVM in, and start as described above, the KVM machine console will appear with a blank border around it. Sounds trivial, but really helps when your bouncing between windows to give you a visual reminder of what level your running at.
Cheers!