Verbeiren's Blogspot: February 2008

Tuesday, February 26, 2008

VMworld Europe 200: Day 1

Here are some quick thought, remarks, things I picked up today in a random order. The moment I'm home with a decent internet connection, I might update some of this info.

General things:

From this morning's key note: 3% of all energy generated goes into our data centers. This is the reason why green data centers are important.
As you can read elsewhere, Novell has acquired PlateSpin. Let's hope prices will drop.
Site Recover Manager is nice, but does not offer anything that can not be done by hand or script.

Performance related:

File system alignment tends to become more and more important. Two of the speakers today claimed it to be very important for file system performance. Just make sure you use the VI Client to create the file systems, it does the alignment for you.
Some Linux kernel use a Khz CPU timer, which causes CPU overhead in the guest. The kernel boot option 'divider' modifies this behavior.

SAN/NAS related:

A lot of people were interested in the talk discussing differences between FC SAN, iSCSI (SW & HW) and NFS. NFS, however, was not really covered in this talk. All in all, there were no big surprises here: FC is usually better than all the other alternatives, especially for large block sizes (less SCSI overhead), software iSCSI uses more CPU cycles than hardware iSCSI, etc. A whitepaper has been published with this info, but I don't have the link ready.
iSCSI has been optimized for 8K block sizes, as this is block size is encountered a lot. The result is clearly reflected in the stats.
An experimental tool is available to analyze guest disk I/O statistics. It basically creates histograms of throughput, latency, average read/write distance, etc. The command line is 'vscsiStats'. I could not yet test it out yet, as I don't have an ESX server in my hotel room. This alone makes it worth being here...
In order to troubleshoot SAN performance issues, allocate a small LUN (e.g. 100MB), so that everything can be cached. This way, you avoid effects of physical disks, spindles, etc.

Network related:

In order to use the enhanced vmxnet driver in 3.5, you need to first remove the existing vNIC and add a new one. Then you can select the new enhanced interface with support for all the new features.
When setting up network failover policies, it is important to take into account the fact that by default the spanning tree protocol takes 30 seconds to open the uplink port on a physical switch. During this time, the virtual switch sees the link (to the physical switch) as up. 30 seconds is twice the default timeout for VMware HA. Rebooting a switch may cause a lot of havoc in this case

As you might notice, I am particularly interested in everything that relates to performance. Furthermore, I have a lot of references to interesting KB articles, but I need to check them out first before posting any info.

Friday, February 22, 2008

VMworld Europe 2008 Timetable - Abstracts

Somebody noticed (in the comments here) that the abstracts for the talks are not in the Excel file I created. I don't have time to add this info now, but I created a quick and dirty html page (with some sed, grep and bash scripting) that can be found here: Session_Abstracts.html

Hope this helps...

Wednesday, February 20, 2008

VMworld Europe 2008 Timetable

For those attending VMworld Europe 2008 next week, it is a hard time selecting relevant talks and presentations from the list provided by VMware. It's a shame no more user-friendly overview is provided. Here's a screenshot of what it looks like:

Rene has done a good job converting the flat list to a table in Excel.

I did not yet find this served my own purpose, trying to get my schedule right for next week. So I created my own Excel planning file. It consists of a planning sheet, the resulting schedule and also a sheet with an overview of the amount of clones I need of myself. I did a first scan of the topics, and the resulting table looks like this:

MMMmmm, it seems I will have to cut hard in my selection...

Anyway, if you're interested in the Excel (2007) file, it's rough, undocumented, but you can download it here: ProgramVMworldEurope2008.xlsx

Some notes perhaps for those interested: the column 'Attend?' should be filled in with 'Y' or nothing. If 'Y' it means you want to attend. The column with '#' stands for the amount of sessions with the same topic over the three days. In the last sheet, you get an overview of these topics with the dates. In order to update the tables for the planning sheet and the issues, just press CTRL-ALT-F5.

Monday, February 18, 2008

Software iSCSI in VI 3: Multipathing and Redundancy

Last week, I did my first VI (3.5) installation using the MD3000i iSCSI SAN. Don't expect many features or a Navisphere-like interface, but expect a light-weight, cost-effective iSCSI solution that is up and running in a matter of minutes. Moreover, it is a supported storage device for VI 3.5.

We set up the storage network, just like I usually do with a Fibre Channel array. The topology is presented below:

I always thought that iSCSI is very similar to FC in its network configuration, and in principle it is, as long as you have two HBAs.

With software iSCSI (as opposed to hardware iSCSI), you can only have 1 iSCSI initiatior (think of it as a virtual HBA). Redundancy is obtained by connecting multiple physical NICs to the storage virtual switch. So far so good, replace HBA 1 en HBA 2 with physical NIC 1 en physical NIC 2 from Server 1, knowing that both pNICs are connected to the storage vSwitch.

Scanning the SAN reveals ... 2 paths (instead of the naively expected 4). Doing failover testing reveals that no failover occurs when disconnecting for instance the link between pNIC 1 and the physical switch. The SAN simply disappeared!

We quickly realized that the whole problem is caused by the fact that there is only one iSCSI initiator (with a specific MAC and IP address) and no real load-balancing (originating port teaming policy is used). Only if we remove the primary uplink of the server, it switches over to the second pNIC, which connects to a different physical switch and also a different NIC on the SAN. In other words, one only sees the third and fourth path in case of a link or NIC failure!

In order for the server to see 4 paths to the SAN, and have complete redundancy for every physical component, one needs an interlink between both physical switches. This effectively solved our issue.

Note: one might be tempted to think that setting the teaming policy to IP hash would solve the above situation of having the second NIC on standby. This is true, only in that case one would need a NIC bond across the two physical switches which also requires an interlink. The effect, in other words, is the same.

Friday, February 15, 2008

Virtual Machine BIOS too fast

As you can read in this post, and as I have encountered many a times: the BIOS boot sequence of a VM is VERY fast. The speed even improved in version 3.5! Trying to get to the boot sequence is harder than playing Quake to the last level...

In the post mentioned above, the author suggests to modify the VMX file and add an option. It turns out that this option is even available from the VI Client under VM Options. This is much easier than changing the config file.

Thursday, February 07, 2008

VMware memory management: Memory Tax

I'm looking into the specifics of how the ESX hypervisor handles memory, and how resource allocation is performed.

One of the things that has kept me busy was the so-called 'memory tax'. This concept is explained in esx3_memory.pdf but was not clear to me when reading this document. This is a quote from the document:

If a virtual machine is not actively using its currently allocated memory, ESX Server charges a memory tax — more for idle memory than for memory that is in use. That is, the idle memory counts more towards the share allocation than memory in use. The default tax rate is 75 percent, that is, an idle page of memory costs as much as four active pages. This rate can be changed by modifying a parameter setting.

Looking further on the web, I found that in 2002 at, a conference, Carl A. Waldspurger was awarded the best paper award for the following: Memory Resources Management in VMware ESX Server. The slides of the presentation can be found here. Maybe it is because I'm used to reading papers (or at least I used to), but I found the explanation much more clear in this document.

I encourage everyone interested in memory management to read the paper.