Migrating from VMWare to OVM

Just some notes on our migration we planned, started, tested and rolled back this weekend =(, Dynamic disks in windows unfortunately didn't come across nicely with clonezilla.

We have big VMWare cluster with 500 or so VM's, sitting on shared storage with DR and HA. Storage performance is generally poor, and backups via the current system don't work so well (snapshots and clones to netbackup).

Our plan:

  • Clone VM's with clonezilla to shared CIFS storage

  • Restore VM's to Oracle VM

  • Setup backups to Bacula directly from the OVM hosts

  • Decommision VM's from VMWare

The Oracle VM servers have been specced so they have 2.5TB storage, 256GB ram and 48 cores (they cost around $16K each). The intent is to run all VM's locally to avoid the performance issues that arise from storage contention on a big SAN, and to make backups easy to do and fast to restore.

OCFS2 which OVM uses to store its virtual machines is a great filesystem that supports instant copy on write hard links of files, similar to BTRFS. These are called reflinks (https://oss.oracle.com/osswiki/OCFS2/Reflink-Illustrated.html) and make it trivial to backup all the VM disks at once on the server to a staging folder. Then to backup the host the Bacula client can pull directly from the staging folder, and keep say 3-4 full copies of that data over a week on our CEPHFS cluster to allow each server to have 1 months backups, with minimal performance impact due to the efficiency of OCFS2's reflinks.

We ran into lots of hiccups on the way:

  • CEPHFS kernel driver has some bugs that can cause kernel oopsies, while CEPH itself (rados/rbd) is quite stable, CEPHFS is much stabler under FUSE than as a kernel module (currently has lasted for multiple TB of data while the kernel driver was giving up after a few hundred GB with a ceph_d_prune oops (I believe related to directory pruning)).

  • Our FusionIO cards (for databases in the future) are annoying to mount at boot, but tweaking ubuntu's upstart script for mounting all drives resolved that

  • Windows Server before 2008 requires disk drivers be reset to defaults to clone to other VM platforms, we forgot to do this before cloning on a lot of VM's but if you have a copy of system rescue cd (www.sysresccd.org) which has a great registry tool on it called chntpw then you can run this script I knocked up to sort that out (http://ge.dec.wa.gov.au/_/opt/mergeide.sh). It assumes your boot drive is mounted at /mnt/window and should work for windows 2000/2003/XP.

  • Clonezilla doesn't properly clone the metadata for dynamic partitions in beginner mode which led to our rollback of the weekends work (http://sourceforge.net/p/clonezilla/feature-requests/5/). I'm hoping to migrate all the data from the dynamic volumes to network volumes before our next attempt, to avoid crazy issues like this (luckily all the boot drives are basic volumes)

  • Active Directory will auto update DNS entries based on DHCP leases of servers with static IP's when you move them, and subsequently delete the DNS entry when the DHCP lease is released on shutdown? This was a little confusing, but we readded all our DNS entries and configured them to only be updatable by an administrator rather than the machine accounts themselves.

So we learnt a lot (thanks team members!) and had a great lunch at Ayhan's Kebab's on George St in Kensington, Western Australia, but unfortunately the majority of our migrations will have to be rescheduled to another day. For the linux VM's we migrated we had no issues =), and the shared management server performs much better on the new servers, so performance wise it looks good but our cloning process needs a bit of polish.

I've still got some bug fixing of a Django app for one of the nicer groups at work (that unfortuantely doesn't have much funding) to finish up before Monday, so I best be off!

Life update

I'm engaged & my partner & I own a house now (well we've owned it for a year lol)! (in Kensington, Western Australia)

Outside of work lately I play boardgames (Pandemic, Zombicide, Chaos of the old world (warhammer), Space Alert) and work on the house (Painting/wiring/plastering, it was built in 1942)

I'm also trying to knock off my final year of uni, so close yet so far (every time there's a shift at work my responsibilties increase, and it gets more difficult to stay focused on uni )=.

At work I enjoy:

  • Django development

  • Implementing linux virtualisation (Oracle VM), filesystems (BTRFS, OCFS2) & clustered storage (CEPHFS)

  • Providing stacks so the rest of our department can easily develop and make our team redundant! (I wish lol)

  • Toying with ideas on how to phase out Access/Excel with something like web2py (http://www.web2py.com/) that's easy for beginners to pick up

And I get frustrated by:

  • Large amount of internal communications that never seem to go anywhere (I've decided to try to blog regularly to change that!)

  • Lack of technical literacy leading to poor decisions/prioritisations (e.g. simple projects that could be a wordpress site blow up into large applications, difficult problems the department faces are ignored as we work on management tools etc)

  • How dissociated the world of IT is from the real world, I feel like a lot of the time we are obstructing people rather than helping them which is frustrating

I'm hopeful that:

  • Open source & community developed software increase the accessibility of people in general to general purpose computing devices, and in turn technical literacy becomes like english literacy - you can specialise but everyone has a base understanding enough to express what they want a computing device to do

  • The strive for efficiency that I push as a literate developer is justified, and making people's jobs redundant is good for the world at large because we should be able to focus on unsolved problems rather than solved ones

  • That I don't go stir crazy with the pressures I'm under =/

Also FYI if you haven't figured it out yet my Google + is just an endless stream of links I find interesting, I post very little actual content there.


works update =D

I guess now is time for an update (warning technical stuff ahead)

I've been working on lots of python code for the last ages - work and labyrinth related mainly. We do lots of server management stuff at work (department of environment & conservation, fire management services), and because I'm lazy/like bash/frustrated with existing config management tools that require centralised deployment (chef, puppet, etc) and provisioning tools that are install only (imaging with clonezilla, debian preseeding, redhat cobbler, suse studio) which are all very cool but fall over when either pulling in local changes easily, copying an existing servers setup (though clonezilla is pretty awesome at that), and most painfully backing up non config based info (databases, imagery etc). So I wrote a tool to do it myself in like 2 weeks at the end of last year, then deployed 40 servers with it, and after doing refinements every couple of weekends its kinda useable (I've still got a bit more work to do on getting some documentation sorted before I release it, which will probably happen July sometime).

hgbackup: stick it on a server with networking, use it to manage launching/messing with anything you can call from bash/anything python can call on a system, tell it to backup directories locally using mercurial or flag them to be backed up remotely, any system with it installed can backup any other system over ssh (uses rsync + local versioning depending on remotes config)

I wrote it from the perspective that I want to vim .history, copy/paste commands into a script("sudo apt-get install supertux => H.cmd("sudo apt-get install supertux", method="subprocess"), the subprocess method allows user interaction, default returns output of command for processing in python (uses commands.getoutput)), but have it easily redeployable elsewhere. The backup commands are just directories or method:data i.e. for postgres you go "pgdump:dbname".

SO that was kinda cool and its been working quite well (maintain backups for two sites, 20+ vms each at eachother on 5 minute intervals, and it manages pretty well), though needs more documentation

That brings me to cool stuff were doing for labyrinth data services which is a private business I run with Patrick Coleman and Scott Percival. We wrote a fancy frontend for KVM in Django, that lets our clients manage their vps power controls and account information (billing, passwords etc) as well as provide a vnc tunnel to their vps so they can administer it remotely. At the moment were doing lots of migrations of customers from our old hardware to newer hardware, and part of the plan is to automate provisioning netbooted servers + config deployment so we don't have to copy template/setup vps manually like we do now, and customers can just pick they want a new *linux distro of choice* server. The app will boot a vm of the network imaging server running clonezilla, which launches hgbackup on first boot, retrieves a config generated by django and deploys custom packages/settings for that user as well as standard packages to make a shiny new VPS (and automatically bill customer of course =P)

We also spent a bit of time on magical network shaping, and abolished quotas because their annoying. OpenBSD has an incredible firewall stack called pf (Packet Filter) that lets us do stuff like shape on WAIX/non WAIX traffic based on routes we receive over BGP (we have our own /22 of ipv4 addresses and a whopping /48 of ipv6 addresses (thats the existing internet ^2.5, i.e. 1 trillion (10^12) squared! cmon everyone get on ipv6 already!) and do elegant linkshare allocation (we use HFSC) to guarantee minimum rates, burst rates, and sharing unused bandwidth. Most ISP's operate at 40:1, we run at 20:1 contention by default, which we find means our users can generally get their full burst rate, while only paying for 1:20th of it (bandwidth in perth is expensive as =()

So those are the personal projects I've been hammering away at for the last 6 months or so, while at work for the last 2 years or so I've been working on a massive project for resource management and data visualisation. An aside, Adrian who I work with has spent the last year implementing lots of OpenBSD routers built on the excellent yawarra embedded boxes that we also use at labyrinth. Anyway we used to have this commercial app called datagate made by datalink systems in america, but it cost a lot, and per user licenses were $1000, so as we have like 2000 people in the department that can use resource information the plan was to write our own resource tracking tool to display resources across the state. The majority of the application (firesource) was developed in Django with clientside javascript being written in dojo. After a pretty good run of about a year we are now redeveloping the app to be a generic vector and raster visualisation tool (it already aggregates data from bureau of met, nafi, firewatch (landgate) etc) and after a bit of bouncing around I settled on JQuery for the clientside js, google closure for compiling and javascript templating, and django as always for the backend. To serve imagery we use mapserver for raster data, and geoserver for vector data pointed at a postgis database. Django is backed with a postgis/postgres database which holds about 2million tracking points for the last 2 years of operation, and we query that entire dataset for the latest points for each device (we have around 300) in 4 seconds almost every minute, with the capability to process a maximum of 10000 incoming spatial records a minute.

I've spent a lot of time working on the versioning/auditing for the new application to ensure that the system is able to do indexed queries across all versions in the dataset (like fullhistory or django-reversion but better =P), being able to display any snapshot in the history of the application (so users can see what vehicles were at fire X that had a rough boundary by time Y etc), and this work is leading into map production, plan is the application will produce a printable map at 300dpi at up to ISO A0 for operational staff to create maps quickly in the field on any machine with a web browser, including live vehicle position information. The Audit stuff is quite generic and I feel I should split it into a separate reusable django app at some point, but at this point theres nooo tiiime (theres a whole bunch of stuff going on with enterprise architecture design happening at work also and somehow I got dragged onto the committee for that also =O)

Well that was interesting, but yeah any questions on bits and pieces please comment, otherwise this is just a braindump for google to index so I can find it all when I get AMNESIA (futurama awesome).

For those whose eyes glazed over - ASCII NINJA TURTLE =)

           ,;;;;;;, `\. `\         .,c$$$$$$$$$$$$$ec,.
      ,;;!!!!!!!!!!!>; `. ,;!>> .e$$$$$$$$"".  "?$$$$$$$e.
 <:<!!!!!!!!'` ..,,,.`` ,!!!' ;,(?""""""";!!''<; `?$$$$$$PF ,;,
  `'!!!!;;;;;;;;<!'''`  !!! ;,`'``''!!!;!!!!`..`!;  ,,,  .<!''`).
     ```'''''``         `!  `!!!!><;;;!!!!! J$$b,`!>;!!:!!`,d?b`!>
                          `'-;,(<!!!!!!!!!> $F   )...:!.  d"  3 !>
                              ```````''<!!!- "=-='     .  `--=",!>
                         .ze$$$$$$$$$er  .,cd$$$$$$$$$$$$$$$$bc.'
                     z$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$c .
                    $$$$$$$$$$$$$$ dbc `""?$$$$$$$$$$$$$$$$$$$$$$?$$$$$$$c
                    ?$$$$$$$$$$$$$$$$$$c.      """"????????"""" c$$$$$$$$P
         .,,.        "$$$$$$$$$$$$$$$$$$$$c.   ._              J$$$$$$$$$
 .,,cc$$$$$$$$$bec,.  `?$$$$$$$$$$$$$$$$$$$$$c.```%%%%,%%%,   c$$$$$$$$P"
$$$$$$$$$$$$$$$$$$$$$$c  ""?$$$$$$$$$$$$$$$$$$$$$bc,,.`` .,,c$$$$$$$P"",cb
$$$$$$$$$$$$$$$$$$$$$$$b bc,.""??$$$$$$$$$$$$$$FF""?????"",J$$$$$P" ,zd$$$
$$$$$$$$$$$$$$$$$$$$$$$$ ?$???%   `""??$$$$$$$$$$$$bcucd$$$P"""  ==$$$$$$$
$$$$$$$$$$$$$$$$$$$$$$$P" ,;;;<!!!!!>;;,. `""""??????""  ,;;;;;;;;;, `"?$$
$$$$$$$$$$$$$$$$$$$P"",;!!!!!!!!!!!!!!!!!!!!!!!;;;;;;!!!!!!!!!!!!!!!!!;  "
$$$$$$$$$$$$$$$" ;!!!!!'`.z$$$$$$$$$$$$$ec,. ```'''''''``` .,,ccecec,`'!!!
$$$$$$$$$$$$$" ;!!!!' .c$$$$$$$$$$$$$$$$$$$$$$$c  :: .c$$$$$$$$$$$$$$$. <!
$$$$$$$$$$$" ;!!!!' .d$$$$$$$$$$$$$$$$$$$$$$$$$$b ' z$$$$$$$$$$$$$$$$$$c <
$$$$$$$$$F  <!!!'.c$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$b  $$$$$$$$$$$$$$$$$$$$r
$$$$$$$P" <!!!' c$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$, "$$$$$$$$$$$$$$$$$$$$
$$$$$P" ;!!!' z$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$  $$$$$$$$$$$$$$$$$$$$

next time I should chuck up some screenshots

Up 3D

It was awesome. I'm kind of sad that there's nowhere else like that to explore Dinusha says parts of the amazon because she is genius. I am now talking excitedLy about the exploration of other planets/moons like Europa and want to do crazy stuff to do with space exploration and I'm currently talking far too fast and excitedly for my wonderfully fabulous gf to transcribe so I'll end with a pic that'll be up shortly.

save for me to read later


DNA seen through the eyes of a coder

Audio stuff

Mainly for Jed & Mike (pls fwd Nat!!!) ;P

Really nice high quality amplifiers:

DIY Loudspeaker Kits:
By the same guys awesome headphones:

Kingrex T20U review (very in depth!):
The class T amps are my favourite ever, this one is similarly built to the one I made D with her speakers

And budget crazy (probably illegal) imports with free shipping:

Hypnotoad says buy a Kingrex!

i miss my vim mug


I left it at my old workplace (thales) =(

savfire in africa!

Mum's boss told her he's got a grant for her to go to Africa for her PHD on Fire Behaviour Research to meet up with these guys:

Is really cool because the enhancement of biodiversity and conservation aspects are very close parallels to what we strive for at work XD

things i have to do

braindump - stuff to do before next monday:

Finish timesheets & overtime forms!! (though probably has to wait till after sunday)
Finish ERD for databases project
Fix 2 way messaging over satellite with datalink systems i50B's
Write network requirements spec for Satellite providers (argh need to do this as of this arvo)
Finish 2 day ESRI course (Wednesday, Thursday, can possibly mux with project work XD)

yeah thats all

Looking forward to:

Craigs Birthday which Dinushas coming too =D
Fun with postresql and preseeds and python and avahi for massive UserFAI and Resource management projects
Databases Exam???
Playing multiwinia vs dinusha for ages

Cool things I did:

Implement a Software management approach to relationship building with dinusha muahahahaha =P (Blueprints, Feature Requests, Bug Fixes)

Latest Month

May 2013



RSS Atom
Powered by LiveJournal.com
Designed by Teresa Jones