The Perfect Time to Try Eucalyptus

Eucalyptus 4.0 is coming soon, and you can try it in beta, on a single system, right now.

To take your own AWS-compatible cloud-in-a-box for a spin, here’s what you do:

  1. Install CentOS 6.5 minimal on a box that supports virtualization. Give it a fixed IP address.
  2. Set aside a range of contiguous IP addresses to play with. 20 should be plenty.
  3. As root, run the following command:

bash <(curl -s https://raw.githubusercontent.com/eucalyptus/eucalyptus-cookbook/master/faststart/cloud-in-a-box.sh)

That’s it.

It’s still in beta, so if it breaks, you get to keep both pieces — but it’s pretty stable for me.

The installation script is based on Chef Solo. The goal is to provide a very simple installation experience that results in either a running cloud, or a very clear explanation of why you do not have a running cloud. Once CentOS minimal is installed, a typical install takes about 15 minutes. (NOTE: do *not* install the Desktop version; NetworkManager and PackageKit get in the way.)

If you have any problems getting your cloud-in-a-box up and running, don’t hesitate to reach out to me on Twitter (@gregdek) or on freenode (gregdek, #eucalyptus). At this stage, bad news is the best news, so if you have bugs, let’s see ’em.

I especially invite my friends who work for OpenStack vendors to see how the other half lives.  😉

The Perfect Time to Try Eucalyptus

The Simplest Way to Learn About Eucalyptus Code

We get lots of people who want to use Eucalyptus as a way to learn about how cloud computing works at a code level. Which is great: Freedom to Learn is one of the fundamental Free Software guarantees.

So what’s my advice to users who want to learn about Eucalyptus? It’s pretty simple.

1. Get a tiny Euca cloud running. The perfect tool for this is eucadev.  If you’ve got a laptop that supports Vagrant, you’ve got a Euca cloud. It’s a small cloud, to be sure, but it’s got all of the key features required for cloud orchestration, and it’s a tool that our own developers use. 

2. Find a problem to solve! The best way to learn about a codebase is to dig into it with a clear goal in mind. If you don’t yet have a clear goal, we’ve got a great list of open bugs that are tagged as “fruit” (of the low-hanging variety) to get you started.

3. Create your own local branch of the Eucalyptus source, and start hacking! An explanation of how to do this can be found in the README for Eucadev.

There’s really no substitute for getting your hands into code. Dig in. If you get stuck, swing by #eucalyptus-devel on freenode and ask for help. (After you’ve read through the docs and Googled a bit, of course.)

The Simplest Way to Learn About Eucalyptus Code

Next Euca hackfest: AppScale

Our next hackfest is this Friday, August 10th, from 11am to 2pm Pacific Time, in #eucalyptus-devel on freenode.  (If you show up at #eucalyptus or #eucalyptus-meeting, we will kindly direct you the right way.)  We will be working on integrating with AppScale.

You don’t know about AppScale, you say?  Well, you should.  AppScale is an open source platform for  Google App Engine apps.  The idea is that many applications designed to run on Google App Engine should “just work” with AppScale and your own cloud infrastructure.

There’s an Ubuntu-based AppScale image already built and ready to go for Eucalyptus; in the next couple of days, we’re going to get that running on our Eucalyptus Partner Cloud, and then we will see if we can get some of this App Engine code running on top of it.  By the end of the hackfest, we hope to have a few of these sample apps up and running, and filed away as Eucalyptus recipes.

If you want to join in the fun, just drop a line to the Eucalyptus community list and we’ll be happy to set you up with all the tools you’ll need.  See you on IRC.

Next Euca hackfest: AppScale

Coming Soon: Eucalyptus 3.next

The first release of Eucalyptus 3 is out, and it’s time to get ready for Eucalyptus 3.1.

There will be a feature listing for Eucalyptus 3.1 on our website somewhere at some point, but that feature list won’t really tell the story.  Here’s the story: Euca 3.1 is the point at which we return to developing code the open source way.

With Eucalyptus 2, we introduced our Enterprise Edition (EE).  That code was a fork of the open source edition, and included proprietary features that were not available in the open source release.  What was the nature of these proprietary features?  Basically, they were hooks to allow Eucalyptus to talk to enterprise-y things, like Big Proprietary SANs and Big Proprietary Virtual Machine Managers. These are the kinds of hooks that large enterprises, who use a combination of open source and proprietary technologies, badly need — in fact, they are hard requirements for these kinds of environments.

The ideal way to add these hooks would have been to add them as well-isolated modules — but in the fast-moving startup world, the ideal way is not always possible.  The decision was made to manage the two separate codebases side by side, with the good intention that when a feature would go into one version, it would be added into the other at the same time.

We all know the old saying about good intentions, don’t we?  🙂

Looking back, it’s probably not accurate to say that the decision was a “mistake”, per se, but it’s definitely true that the open source version suffered.  In retrospect, it was inevitable; whenever a hard choice needed to be made about the allocation of scarce resources, the choice was always, always, always to solve the customer’s problem.  So the subscription version was patched and tested, while the open source version atrophied, with the lack of commit activity leading many observers to conclude that Eucalyptus Was Dying.  Which was completely wrong, of course; the customer base grew the whole time, and the company grew with it.  The new version of the product marched along, with a shared understanding among Eucalyptoids that when the time came, the codebase would be rebuilt the right way.  The proprietary hooks would be pulled out into modules.  Open source would never again be treated as a second-class citizen.

It’s been a hard and frustrating wait for a company that views itself as open source to the bone.

So, starting with Eucalyptus 3.1, the two trees will be joined into a single tree, with a handful of modules available to subscription customers only.

The road has been long.  As the engineers pushed tirelessly towards the GA of Eucalyptus 3.0, Andy took a machete to mainline and created a parallel branch in which the proprietary bits were unceremoniously hacked out.  It wasn’t necessarily pretty, but it worked.  He also made sure that the bits would sync to this new branch on a nightly basis.  Graziano then set it up so that the new branch would be pulled into Launchpad, where we called it “devel”. Now the time has come to turn our attention to fixing that code in earnest.  We recently renamed the branch 3.1, and set it up as the mainline for the next release.  Go check it out.

What comes next?  Glad you asked!

1. We’re cleaning up the migration from MySQL to Postgres.  We decided to move to Postgres because we believe that this will make it easier to redistribute to the open source community.  The Postgres bits have been there for a while; we just turned them on when we branched, and everything seems to work so far, but there’s always a bit of housekeeping to do with such changes.

2. The entire codebase will be reexamined for proprietary bits that may have been left in the 3.1 tree.  This will require some refactoring to ensure a separation that is truly clean.  We will then make those modules available to our subscription customers only.

3. QA, QA, QA.  We will put the 3.1 release through the same rigorous QA cycle that we’ve previously reserved for the subscription version.  We’ll also be working on opening up more and more of this testing code, so that people can build their own QA tools for testing their Eucalyptus cloud (and EC2 as well).  See the eutester project for more info.

4. We will release the documentation under a Creative Commons license.  The documentation for Eucalyptus 3 is available right now, but we’ve still got some work to do on trademarks and licensing questions.  Once we do, we’ll make the DocBook source files available, and we may even kick off our first translation projects.  See the documentation project for more info.

5. We will make a lot of engineering decisions about how to open up the development process.  Issue tracking, revision control, patch management, feature process, bug triaging, release management, distro packaging — we’ll make key decisions about all of these issues in the coming weeks, and we’ll be sharing them with the Eucalyptus community every step of the way.

So when will the 3.1 release happen?  Tough to say.  Once we start putting the code through QA, we will have a good idea of how functional the code is; from what we know now, it’s certainly good enough to try out in non-production environments, with a reasonable expectation that things will mostly Just Work.  But the difference between that state, and the desired state of Open Source Goodness, will take a little while to bridge.

It’s an exciting time.  I’m proud of what we’re accomplishing in a short timeframe, and the future looks very bright.

Coming Soon: Eucalyptus 3.next

Why the Fedora ISV SIG never caught fire

Here’s a list of popular open source products that cannot currently be found in Fedora repos:

  • Zimbra
  • JasperSoft
  • SugarCRM
  • Alfresco
  • Magento
  • Eucalyptus
  • JBoss 🙂

Once upon a time, it was part of my job to help these kinds of companies to work more closely with Fedora. We created the ISV SIG for this purpose. Karsten and I would go to trade shows and meet with various open source vendors, and we’d talk with them at length about the great benefit of leveraging the Fedora install base, and the power of “yum install YourCoolProduct”, and the general usefulness of building an ISV packaging community, and they’d nod and smile, and then we’d have a follow-up meeting or two to discuss the ins and outs of being in a distro. And then… well, nothing much would happen.

Now, as it turns out, I’m in a position to appreciate, and articulate, these issues from the ISV’s perspective.

What do the applications listed above have in common? A couple of key points.

Point One: they are all sponsored by companies, who use the open source projects as a base from which to build proprietary products.

Point Two: they all tend to be the primary application running on their machine — in other words, they are appliance stacks — and they need to limit variance in those stacks to help guarantee a good experience for their users.

It’s easy to claim, and many do, that these projects aren’t in Fedora (or Ubuntu, for that matter) because of Point One. In truth, Point Two is *way* more important.

There’s a great page on the Fedora Wiki that does a good job of discussing the potential gains and losses of putting your ISV application into Fedora. I’m going to go through those gains and losses, and share my opinions of them, now that I’m on the other side of the fenceline.

[GAIN] Reduced maintenance burden for all dependencies that are already packaged in Fedora: no need to ship security updates for those components.

This is a good potential gain, but note that it does not require the ISV to be *in* the distro to get this gain. It’s entirely possible to package *on top of* the distro, track the distro closely, and get all of these maintenance gains, without incurring the high cost of pushing packages into the distro and maintaining them. I suspect that this is precisely what many companies choose to do.

[GAIN] Code auditability: the Fedora packaging processes ensure that all code is described by metadata (i.e., spec files). The packaging tools allow this data to be queried in informative ways. ISVs don’t necessarily track this data otherwise.

Also true, but again, note that it’s possible to build RPMs and get the same advantages without putting those RPMs into the distro. There are two separate costs here: there’s the cost of building an RPM, which is comparatively low if you’ve got the source and an experienced packager at your disposal — but then there’s the cost of pushing the RPM into the distro and following the distro’s rigorous rules around versioning and namespacing and supportability, which is a *much* higher cost for the ISV. The gain from that additional cost must therefore be demonstrably compelling.

[GAIN] Availability of package-specific expertise: ISVs can consult other packagers about the upsteams of their dependencies. Each Fedora package maintainer acts as a known point of contact for their package’s upstream project.

This is very much a potential gain, if it’s true. But what happens when most of the packages aren’t yet in Fedora? This is especially problematic in the Java world, where there are tons and tons and tons of jar files that are not “packaged” as such in Fedora, but are still perfectly useful to the Java community in jar form. If the distro packaging expertise for a particular jar doesn’t yet exist yet, then the company who pushes the packages into the distro must take on the initial cost of becoming that expert. It’s definitely true that this expertise can be shared over time, and also true that such shared expertise is a long-term win — but the upfront cost is high, especially for a small company that has lots of competing priorities.

[GAIN] The trust of Fedora users: ISV products packaged in the Fedora way will be more warmly-received by Fedora users than standalone GNU/Linux binaries.

Citation needed. 🙂 I mean, yes, I believe this too, but it’s a gain that’s difficult to quantify. The real benefit we’re trying to claim here is that “yum install foo” is a simpler and awesomer experience — and it is. But the difference between “yum install foo” and “wget foo-installer | sh”, which adds the ISV’s yum repo and gpg key and then kicks off “yum install foo”, is not really that great.

[GAIN] Stability on Fedora: standalone binaries break frequently because Fedora is such a fast-moving target. Built-from-source packages have proven much more stable, since incompatilities are caught during mass rebuilds.

This is a bit of a tautology. It’s essentially arguing that your ISV packages will build better with Fedora because you’re working to make them build better with Fedora. Which is true, but again, can be true by building *on top of* Fedora and not *in* Fedora.  And it also only addresses build time failures, which, for an application, are failures that you’re likely to find immediately anyway if you’re doing proper build/test integration internally.

[GAIN] Bug triaging: Fedora users report bugs to Red Hat Bugzilla first; the package maintainer decides if it’s a packaging bug or an upstream bug. If it’s an upstream bug the packager will ideally create a minimal test case and send it to the upstream maintainers.

This is a strong *potential* gain, if the package maintainer is a trusted and responsible member of the community. But what if the package maintainer is an employee of the company, as is usually the case? It’s not a gain at all.  And what if the package maintainer also maintains 20 other packages, and isn’t particularly responsive?  Then it’s a net loss.

[LOSS] Binary dependency predictability: dependency updates may mean that the deployed set of components is not the same set of binaries the ISV tested during their release process.

Bingo!  No more calls, please — we have a winner.

Here’s the thing: an ISV does not have the luxury of dealing with variance. We’re dealing with tons of bugs, every day, because we’re young companies, pushing as hard and as fast as we can to make our software experience better. When we’re trying to kill a crazy bug for users/customers, the first order of business is to reduce the uncertainties, and the easiest way to do that is to be *very* specific about configurations. This is especially true as the software increases in complexity.

We can assume high competence and good faith on the part of community maintainers, and still be relatively certain that those good actors will make changes, for good reasons, that will damage the ISV’s application stack in unpredictable and important ways. Software is mean-spirited like that.

This could, in theory, be mitigated by keeping multiple versions of things, and having better mechanisms for tracking those versions. This is something that Red Hat Network customers wanted for years, and finally got — the ability to install a very specific package manifest that is not “all latest packages”, but “these specific package versions”.  But Linux distros don’t work that way, for good or ill.

In theory, everyone should always be running the latest version of things. In practice, that can be very difficult — and it can be *especially* difficult for the ISV when multiple distros have different notions of what the latest version is, and *exceptionally* difficult when those package manifests can change without warning, and outside of your control.

Maintaining a functioning product in multiple cutting-edge distros, with different release cycles and different dependencies, requires a serious, serious commitment to continuous integration and testing. I believe that Eucalyptus has a better process for this than most — and still it will be a tremendous challenge for us to keep up with two different fast-moving distros in Fedora and Ubuntu.

[LOSS] Unity with Windows release process: someone on the ISV’s team will need to be a Fedora contributor or they will need to recruit an external packager.

You can replace “Unity with Windows release process” with “Unity with Ubuntu release process” and the problem is the same. There are huge differences, of course, between a Windows release process and a Linux release process — but even staying in the Linux world, there’s a considerable difference between the Ubuntu release process and the Fedora release process, and expertise in the one in no way guarantees success in the other.

[LOSS] Ability to customize dependencies arbitrarily: there are rare cases where Fedora ships different versions of the same component for compatibility but in general this is strongly discouraged; custom patches should be sent upstream or eliminated by patching the product’s code to not require them.

Absolutely.

[LOSS] Download counting/tracking: if an ISV provides a tar-based distribution from their website, they can track counts and/or emails. This may be important for their marketing department.

Ayup. 🙂

* * * * *

It looks pretty grim in the end, doesn’t it? Well, it’s not as dark as all that. There are legitimate ways for the committed ISV to bridge the gaps over time:

1. Commit to building RPMs (and dpkgs), from source, the right way, for the ISV product, and making those source packages available to whomever wants them. There are legitimate reasons for an open source company to do this, and it’s a necessary precondition to being in the distros anyway.

2. Release their Linux versions as add-on yum/dpkg repos.  Of course, this also means being able to supersede/obsolete distro packages with foo packages, but this is easily done by maintaining separate namespaces.

3. Continue to work with other ISV vendors on packaging best practices at every opportunity, even if those packages don’t immediately end up in the distro.

4. Explore development builds that depend on the latest packages, available from wherever. One of the great advantages of Fedora, and other fast-moving distros, is that they do a great job of managing the future. We don’t want to live in the future, but we certainly want to have our eye on it, and that’s a great reason to continue to *try* be in Fedora — but we also need to make it clear to potential users that the future and the present don’t always see eye-to-eye, and that can be difficult messaging to convey.

The truth of the matter is that not every user understands the intricacies of the open source development model, and most ISVs in a competitive market get one shot to connect with their potential customers. One. Which means that the ISVs are going to do everything they possibly can to make sure that they’ve got control over how that experience goes, at the lowest possible development cost.

Fedora can afford to live right on the bleeding edge because they’ve got CentOS/RHEL to fall back on. Not everyone has that luxury.

(p.s. looking forward to talking more about this at FUDCon.  Also: the drinking.)

Why the Fedora ISV SIG never caught fire

Coworking in the Bull City

Eucalyptus is now a proud sponsor of Bull City Coworking. Various folks have tried to get a coworking operation up and running in Durham in the last little while; props to Robert Petrusz and the gang for actually getting it done.

Both Andy Grimm and I are Eucalyptus employees who live and work in Durham.  Now we have a space to hang out in, which is handy, because while working from home has its advantages, if you do it *every single day of your life* it can get old in a hurry.  So here were are, in East Downtown Durham, a stone’s throw from the best Cuban food in the Triangle.  The space is spare, but growing.  I CAN HAZ WHITEBOARDS, which is awesome.

So, Durhamites.  If you want to get out of your stuffy home office and come hang out for the day sometime, and especially if you want to talk about open source cloud awesomeness, ping me and I’ll set you up with a day pass.  Y’all come, hear?

Coworking in the Bull City

Mitch wrote the book on AWS tools.

No, seriously, he did.

Actually, Mitch Garnaat has written a bunch of stuff. He wrote boto, the excellent Python library that talks to AWS-based backends, and on top of that he wrote euca2ools, which is the free software client of choice worldwide for sending management commands to AWS, and to Eucalyptus, and to OpenStack Nova (well, the parts that have been written yet, anyway.)

Now he’s written an actual book about how to manage your AWS instances with Python and Boto.

I would argue that Mitch Garnaat is the authoritative voice on AWS command line management tools, and this book cements the place that he has already claimed for himself.

Well done, Mitch. Well done.

*slow clap*

Mitch wrote the book on AWS tools.