I’ve just finished my work for today. For the first time since I can’t remember when it was truly productive and inspirational day. I had time for research, time for administrative work and fiddling with my hardware. But let’s start from the beginning.
Since I’ve done Novell certification and got my first client for setting up Novell services on their servers, I did a fast-lane getting more knowledge about their services. Currently I’m pretty familiarized with most of the Open Enterprise Server 2 services. Being a Linux specialist for a long time gave me a head start, but I have to say that Novell services are NOT like other Linux services.
It’s quite easy for me to learn new things in IT, especially when it comes to networks and Linux servers, because technical details of computer networks, data-links, telecommunication technologies, protocols, packets and all this black magic was always somehow giving me thrills. I love multi-layered, complex and interdependent systems. I feel like a fish in a vastness of the ocean when it comes to that. Networked computer and electronic systems are my passion. They are like complicated Lego blocks, partially a mystery and partially a riddle that I have to get knowledge of and solve them.
And that’s it. In the last month I have learned much. There’s a topic I’d like to bring up to light. Linux Virtual Machine Hypervisors. As much I’d like to say that all or at least some of them are doing their job good, unfortunately I must confess, there are some problems with each of them. My adventure with hypervisors started with Linux-Vserver few years ago. I can’t say much about it, because when I tried to use it on my Mandriva based systems there where many kernel problems, so after few days of trying I gave up. Perhaps currently Vserver is ok, but I’m not going to give it a go, especially when there are better virtualisation solutions. After Vserver I went on to look for good emulation/virtualisation software and there where few which were better/worse. One worth mentioning was QEmu, because it at least worked, but I finally stuck with Sun’s VirtualBox. That was really polished packed back then, compared to other hypervisors. And it was running smoothly on Mandriva with pretty obsolete hardware. So for some time I used it.
VirtualBox is great, but it’s really more of a desktop, than server solution. It’s good for running Windows and some Linux distros, but I couldn’t run Novell SLED/SLES in it. I’m still using it sometimes on my hardware, but rarely. When it comes to server solutions I have to say, I’ve made one of the biggest mistakes in my career, when I selected VMWare Server for my primary internet server. This software is so buggy I wouldn’t recommend ANYONE to use it in production. Especially on busy I/O servers. I’m running few VM’s on my server hypervised by this piece of shit and I have to honestly say, that I failed at choosing it.
The server is an Intel Xeon Quad based platform with LSI SAS RAID controller, pretty good amount of memory and processing power. Unfortunately the RAID controller is heavily used. There are many fast small I/O operations and this is what causes a pain in the ass for me. There’s a combination of three factors which sports disastrous results in the VM’s. First, the awesomely shitty written VMWare Server. Second is a combination of the bug in S.M.A.R.T. and generally guest Linux VM domains kernel when hypervised by VMWare Server. Bug still unsolved, despite the fact it was supposedly resolved. The third is a problematic LSI SAS RAID controller which when combined with two previous factors is a fatal combination for guest OS’es. Why? When some of the VM’s are on their limits of I/O bound throughput and a spurious IRQ from the controller is received by the host kernel, VMWare Server causes a Busy I/O within the virtual machine. The default behaviour for Linux kernel when receiving Busy I/O is to lock down the partition and set it to read-only mode. You can only imagine what hell it causes.
Of course the problem theoretically can be avoided when setting default guest Linux kernel behavior to ignore Busy I/O received, but it may cause data corruption (this fail-safe is there for a reason). If you’re adventurous as I am (and like to do emergency repair run to data center at night) you can set this behavior. But still it’s of no use. Why? Because of VMWare Server. When the hypervised VM receives Busy I/O the whole VMDK file (containing the guest partitions) is locked down by the hypervisor and it is set to read-only. Only reboot of the VM helps then and honestly. Imagine rebooting a critical virtualised server in the middle of the day. Sounds sweet, right? As a bonus there were few times when the partitions went boo-boo and no automatic restart was possible, because you had to manually run fsck, then reboot again. Add to this at least half an hour of partition scanning by fsck in the middle of the day and you could say Inferno breaks loose at your doorstep.
But if you still think you can do as BOFH masochist imagine this. If you like to at least fake that you’re security conscious, you’re running your server management services through encrypted and secure channels like SSH, tunneling ports for those services. VMWare Server has an SSL encrypted console which allows you to manage your hypervisor and VM’s. Written as slow web app with combination of console plugins for your browser. Who the fuck invented this? But it’s not the end. To use console you have to expose all three management ports including a port from root-only allowed range. So. If you’re doing reverse tunnel from a remote location, you actually HAVE TO expose root login to your management station by the means of SSH. Otherwise reverse tunneling of root-service port won’t be established. Just imagine what piece of crap the VMWare Server is.
If you ever considered using VMWare Server, especially on Intel S5000PAL with LSI SAS Controller. DON’T!
Being fed up with this hypervisor I started to do some research. I’ve done test-runs of VMWare Workstation and I have to admit. Despite similarities with VMWare Server it seems okay. Nevertheless it’s not server solution, but desktop one instead and second thing is that it’s not free.
So I went to check VMWare ESXi 4. This is a barebone (Type I) hypervisor, unlike the others previously mentioned, which are os-managed hypervisors. I have to say that I was positively surprised. It runs smooth, the installation and configuration is simple. There’s only one thing. Your hardware must be on the compatibility list or else you’re not going anywhere. And the second con is that to manage your VM’s you have to use something called vSphere Client, which runs only in Windows. For me, avoiding Windows as much as possible, this is quite unacceptable that I can’t use my Linux desktop or laptop to manage the hypervisor running on de facto Linux, which in turn manages Linux guest VM’s. Fortunatelly there’s a VMWare Workstation which I can install Windows on and then use vSphere from it. But it’s unnecessary and unproductive burden. If I can’t find better solution I’d probably migrate all my VM servers to ESXi, but it’s a tedious task. Why? Because ESXi has a special partition for VM data called VMFS, which cannot be mounted by anything else than ESXi. Hopefully I don’t have to comment further on what it means, especially when you don’t have another spare machine exactly like this.
So. I didn’t expect this post would be primarily about VM Hypervisors, but there’s one mainstream Linux hypervisor I didn’t mention. Yep. It’s Xen (yes, I know there is also KVM, but honestly, who uses it?). I’m currently in progress of installing few servers with Novell services for my client. They decided to go with Xen. I didn’t have previous experience with this system, so I had to quickly get familiar with it. I must say that on SLES11 it’s quite stable and easy to use. But… (There always has to be one, right?) Who the hell writes VM management software in Python? I know, we all love this language, it’s beautiful and I worship it every day. But it’s too slow for things like managing VM’s! Ok. Maybe I’m little exaggerating here, but honestly. It’s slow. And the Virt-Manager is very buggy. For example. I have installed the guest OS and then removed the DVD from the drive. And guess what? The machine would start (after few tries), but it won’t get into console throwing errors about vbd not found, blah blah blah. And guest what. Restarting the system won’t help. Editing XML and VM configuration files won’t do either. It won’t allow you to edit hardware settings for the machine, throwing this stupid error over and over again. After two days I finally managed to get into console by hand editing files in /var/lib/xen. No mention of such things in documentation. No way.
But it’s not the end. The management console is broken in other parts too. For example, after installing second server, rebooting the whole system virt-manager simply hangs when clicking to run the second VM. I understand that it’s not the core software, but hey. It’s supposed to be production ready Hypervisor (tools included) right? I say it’s NOT. Nevertheless I have to admit that when the VM’s are up and running Xen is quite stable and amazingly fast in paravirtualized mode. I think that closes the topic.
I thought about writing something here about some interesting things I have also done today, but given current time and volume of this post I’ll only give a quick overview.
So. I thought today again with this VMWare Server on my hardware, because I had to prepare some virtual environments for the Pylons application server for the web application I’ve developed. Imagine that 10GHz of computing power was all used for unknown reasons (probably no DNS connectivity as usual and read-only partition on logging server), but I didn’t have time to make thorough checks. VMWare causes some kind of race condition which eats all of CPU resources. Another point to finally kill it. Anyway. When my pip install run installing all the necessary dependencies for the app server, I have done a little research into a topic of filesystem synchronization.
Recently me and my friend installed iFolder on our development and test Open Enterprise Server preparing for iFolder installation on our client’s servers. I must admit that it’s quite useful and certainly working well. We’ve also tested Novell iFolder client on SLES and Windows. Very nice file synchronization software indeed. I’m using Mandriva on my laptop and unfortunately iFolder client is not supported on any other distro except SLES/SLED. But I’ve managed to install and run this client software successfully under Mandriva. If I find some time I’ll write a little how-to about this topic. It needs some fiddling, but it can be done.
For a long time now I’ve been planning to make myself useful solution for synchronizing files between my machines. I’ve looked into various open source solutions, but there was always something what didn’t suit me. Because of this I have started a project, first of the building blocks for such synchronization system, namely the abstract client/server network library codenamed PyComm. Unfortunately I didn’t have time to finish it, but I’m slowly getting back to it after I do more important things. PyComm is a hybrid solution, rather simple in design, but modular, extensible, robust and secure. I have taken a hybrid cpu-/io-bound approach where primary data loop processing is cpu-bound and connection data loop is I/O-bound. Honestly, I haven’t tested it yet and no profiling were done, so I don’t have any real data to compare with standard approaches to building server architecture. In theory threading the IO operations and processing the primary loop should allow for fine-grained tuning of delays and data queues. In theory it should also prevent or at least limit resource usage when under heavy load or DDoS attack by the means of carefully adjusting cpu-to-io ratio. But this still has to be proven if it’s effective approach.
On the other hand PyComm defines a custom binary protocol, which can use TCP and UDP (depending on situation) as it’s transport. It encourages the use of data encapsulation which is somewhat similar to SSH channels. The data stream is multiplexed and sent as custom frames, each with it’s own header, payload and terminator (footer) sequence. This allows dynamic content transfer unit adjustment for each frame and (at least in theory) better guaranteed delivery on high packet-loss links such as WiFi, where TCP does not (it should, but due to the nature of radio data-link layer it is not guaranteed – correct me if I’m wrong). There are more things I want to test in PyComm, but it’s not the scope of this post.
So. The PyComm is a building block for file synchronization solution I’m planning to get my hands on as soon as time allows. I still haven’t decided on the architectural approach and design for it. Currently I’m researching this topic. For Linux/Unix/MacOS X one of the underlying workhorses would be FUSE. Windows is another pair of socks. And because of all those things I have mentioned earlier I have started to look for information about Windows filesystem drivers. I’m not very familiar with low-level guts of Redmond’s hell. IFS seems like hell and it’s very costly, so naturally it goes to trash from the start. But I have stumbled upon some nifty little project that goes in the footsteps of FUSE. It’s named Doken and it aims to be FUSE for Windows. I have tested it superficially and I must say it looks promising. Except the fact I’d have to code myself a Python wrapper around the library, because it’s written in C. Hopefully PyComm would run on Windows without major architectural changes.
The goal with this synchronization solution is to provide a fine-grained security and replication scheme across multiple machines on multiple platforms and architectures (for the start Linux/Windows) without the need for domain controllers available within the network. I’m not rejecting some LDAP connectivity in the future, but for now the solution must achieve three goals: be robust, be secure, be easy-to-use. After all it’s just a proof of concept and a great testbed for PyComm. Why reinvent the wheel? We have so many synchronization solutions out there, SSH, NFS and rsync comes to mind? NFS has it’s problems. Everybody knows that. It’s not truly intended for synchronization, it’s not very multiplatform and not very secure. SSH is too slow most of the time and not meant for synchronization. Rsync is not secure (of course it can be when tunneled through SSH, which is slow). iFolder on the other hand is not free and it doesn’t allow fine-grained security/replication schemes, is not supported on many Linux distributions, it needs eDirectory to run and it’s written in Mono which I personally reject on the grounds of “too much overhead” – Java anyone?
Anyway. When I was looking for information about Windows filesystem drivers I’ve came across another interesting thing called MANET – Mobile Ad-hoc Networks. It always amazed me, why the hell mobile devices like smartphones, laptops, netbooks, tablets and all that crap which is loaded with communication protocols like WiFi or Bluetooth, are not IP carriers/routers among themselves. Usage of TCP/IP stack should be obligatory when interconnecting those devices. Instead they use some stupid, inflexible and unnecessary protocols. It’s a natural way to go towards decentralized, self-organizing mobile ad-hoc mesh networks. I envision marginalization of traditional network connections within the next twenty years and a shift to global decentralized mesh network. It’s obvious, it’s economic, it’s broadening the coverage of IP connectivity among people and across the globe and perhaps in space. The “802.11s” standard is currently drafted, so there’s much room for improvement. And the seeding phase of global mesh network is going to be very interesting one indeed.
The advent of global decentralized mesh network is a major blow for greedy governments and corporations wanting to control the information, which is liquid. It must flow (like spice π ). The ad-hoc mesh network is uncontrollable, it doesn’t fall into censorship (or at least poses a major technological challenge for such task), it follows the pattern of minimum energy path function and is viable economically. It is also furthering IP connectivity and redundancy, so most of the time you’d stay connected wherever you are.
On the other hand global dynamic mesh network is a serious invasion of privacy (to some degree, because too much information causes it’s annihilation) and there may be individual restrictions in force (if not imposed by appliance makers) which won’t allow the mesh network to function properly or at least to it’s fully possible extent. There are also serious concerns for security, because global mesh network may (or may not, depending on the architecture) endanger security, ease man-in-the-middle, identity theft and other attacks, allowing malicious hackers to gain access to mobile devices and sensitive private data.
Nevertheless MANET’s are the future. And when I finally receive my HTC HD 2 (should arrive within few days, maybe tomorrow) I’m going to setup a small MANET for myself to test it and get hands-on knowledge about it’s routing systems and other such things. I’ll try to post something about this small research after I’m done.
I was supposed to end this post many paragraphs before, but… Networks are so exciting. π
Later all.