Archive for the ‘KARL’ Category

KARL and zero-downtime updates

February 18, 2010

For the KARL project, the development team is primarily involved in operations.  Along with Six Feet Up as the hosting provider, we are responsible for many activities in “SaaS”.  Bugs are reported to us, we do the software updates, we help monitor the site, we do staging and testing of customer instances.

I really enjoy this aspect.  It’s different from past involvement in open source projects, where involvement in the software is somewhat de-coupled from living with your mistakes. [wink]  Stated differently, we have a direct interest in stability, performance, quality, and even in things like making KARL easy to monitor by Zenoss.

We periodically update the software.  Which means restarting the app server.  Which usually means, downtime.  Over time, we’ve whittled that down.

First, KARL restarts fast.  Like, two seconds or so.  Thus, the impact is minimal.  Next, we use mod_wsgi, which lets us do “graceful” restarts in Apache.  Serve all your current requests and restart your processes.  These combine for providing very fast updates.

There’s one aspect that’s harder though.  Sometimes our updates require “evolve” scripts to update data.  For example, adding an index, or fixing a value that requires waking up lots of objects.

We used to do this live in a ZEO client, but when the evolve script takes more than a few minutes, we get prone to conflict errors with the running site.  Which means, shutting down the main site.  Which, sucks.  (I’ve become quite obsessive about performance and uptime.)

We have some ideas we think can mitigate this:

  • SSD.  Six Feet Up has the solid-state disks installed.  Because of RAID and cabinet issues, the SSDs are going in the spare box in the rack.  We’ll then move the site over.  The hope is that evolve scripts get faster.  If the evolve scripts are bottlenecked elsewhere (e.g. ZEO single-threading), then that’s a different issue.
  • Read-only mode.  Perhaps we could leave the site in read-only mode during the update, with a little banner informing the user.  Preferably we could put the site in read-only mode without a restart.

Any other ideas on minimizing downtime on such applications without major changes in architecture?

KARL news: calendar, formish, customization, admin, daemons, SSD

February 13, 2010

This has been the week of major KARL updates.  KARL is the open source collaboration and knowledge system published by the Open Society Institute.  It is an end-user product atop the BFG web framework.

Some recent happenings in KARL-land:

  • We are wrapping up a major improvement to the calendar tool. We introduced a concept of sub-calendar “layers” that can aggregate events from other communities.  More visibly, we completely re-did the UI with a new Weekly view and lots of Ajax sprinkling.
  • We also did most of the work to re-implement the entire form system using io.formish.  (repoze.bfg.formish to be more precise.)  Big reduction in code and test code.  Still have a lot to think about on form controller patterns.  Those going to PyCon can hear Chris’s recounting of form controller torture in a panel he’s on about form frameworks.
  • Along with our friends at Six Feet Up as hosting partners, the KARL team is now operating KARL sites for five organizations.  KARL has a unique approach to customization, the inverse of the traditional Zope approach.  In KARL, a customization package is the starting point, and that pulls in the main software.  Or, doesn’t.  We just rolled out changes to thin out the size of each customization package.
  • Chris Rossi has been working on a web interfaces to a number of admin activities, including integration with Six Feet Up’s Zenoss monitoring.
  • KARL has a number of periodic admin jobs, for things such as processing incoming/outgoing mail, pulling in feed content, etc.  To date we had been running these as cron jobs.  However, we had cases where hundreds of crons got piled up.  In the latest updates, we converted these to Supervisor-managed jobs.  Risk involved, so we’ll see how it goes.
  • And finally, we are going to work with Six Feet Up to install solid-state disks.  Our initial test shows that an SSD alone, with no code refactoring, will completely eliminate our performance concerns on LiveSearch.  As well as benefit other catalog-constrained screens, possibly.  We’ll report back once we live with them for a while.

All in all, KARL (like BFG) is humming along nicely.  Just to emphasize: KARL isn’t a framework.  It is an out-of-the-box product with a strong opinion.  By making such a deliberate choice to not boil multiple oceans, KARL gets to be very compact, very fast, and very stable.

That’s the key takeaway for people working on larger projects that have dynamic performance needs.  Unless your needs fit into Product X’s bulls-eye, you’re probably better off not beating Product X into submission.  Instead, we need an approach where assembling your own custom application, leaving out the parts you don’t need, is more feasible.

Not only is this a win for custom apps, IMO it’s actually a win for Product X.  Instead of having reputation beatdown when it doesn’t excel at Every Possible Thing, it can just say: “We’re good at X. If you want X, you want us.”  Then, focus scarce resources and reputation on being the best possible X.

Performance and memory usage for KARL

January 25, 2010

I’ve enjoyed seeing some writeups on requests/second and memory usage for upcoming versions of Plone.  It’s great to see things trending in that direction.  Hopefully with some tough choices and deprecation, more gains can be made (just my personal opinion.)

I thought I’d give a primitive try at the same numbers for KARL, the collaboration application atop BFG that we’ve been working on and deploying to customers.

Using the ‘ab -n 100 -c 2’ on my first gen MacBook 2 GHz, 2 Gb of RAM, I leveled off at just over 134 requests per second.  Memory usage was 31 Mb.

Obviously it’s not an apples-apples comparison.  The feature set is smaller.  Although we do have cataloging, text search, workflow, security, and the like, there’s a ton of stuff we don’t do.  We’re an end-user application with specific features, versus a framework.

On the other hand, all requests in KARL are authenticated and fully-dynamic.  So the 137 rps above?  That’s our slow number: authenticated, personalized, security-aware, fully dynamic.

For more fun, we recently built an ugly, cheap Core i5 box in the Agendaless office for $600, with 4 Gb of RAM.  In production we deploy under modwsgi, so we fired it up to have 3 processes (for 3 of the four cores).  We also have a script that lets us bulk load 300 sample communities, each containing a bunch of content.

That’s a bit more realistic of a test, since we start paying the price of having content in the catalog.

In that “with content” test, we got 349 requests/second.

Sometime soon we’re going to think a bit harder about a more realistic test.  Pounding the same URL over and over as the same user just doesn’t mean squat.  Well, it’s valuable in so much as it is a veto: if your numbers are pathetically low on the fastest-possible “test”, it’s only going to get worse.  We are slowing building up some Funkload scripts that cover a scenario which includes different users, different activities, and some writes as well as reads.

We need this as we are evaluating various KARL ideas in 2010.  First and foremost, we bought a solid-state disk for the test box.  We had a query (prefix match on text search, where only one letter was entered) which blew up our system previously.  Think, 60+ seconds.  That time fell down to 2 with the SSD.

Next, we’d like to see some before/after on RelStorage using some real-world scenarios.  Finally, I’d like to see some before/after on repoze.pgtextindex, where we swap out just one of our catalog index types (the text one) with transactional text indexing in Postgresql.

Early impressions on modwsgi in production

July 21, 2009

At the end of May we made the KARL cutover from KARL2 to KARL3.  We have now had almost 8 weeks in production, so we can form impressions about some of the decisions.

For example, we’re using modwsgi as a WSGI server.  In a way, this was a surprising decision.  Chris and I were both a bit skeptical about whether it was a risk worth taking, as we hadn’t used it “in anger” beyond the site.  As somewhat a throwaway test, I set it up on the test site we used for OSI to do the user acceptance testing for the 15 or so milestone deliverables on KARL, but with the idea that it wasn’t a permanent decision.

By the time we setup the deployment server, we had months of living with modwsgi under our belt, somewhat by accident.  Far more important, we added Shane Hathaway to the project.  Like Chris Rossi, Shane has been a boon for KARL in many ways.  In this case, Shane had experience with modwsgi and said he would make sure we could stand behind it.

So we put it into production.  We’re running on an 8-core box setup as a Xen server, with the OSI instance getting 3 CPUs.  We started with modwsgi setup to run 2 BFG application processes (and thus ZEO clients), each with 2 threads.  modwsgi gave us some simplification: we didn’t need Apache/mod_proxy + BFG/Paster, with the latter managed by supervisor, possibly with a load balancer in between.  Instead, we let modwsgi in daemon mode handle that.

We also found that we could upgrade the software on the server, send a -GRACEFUL restart to Apache, and have a zero-downtime update to the server.  Which was nice.

Later, though, we found a really useful benefit to having modwsgi in the equation.  We then found that our LiveSearch implementation (and another part of the application) were slower (up to a second for a request) than other parts.  So we put in an Apache alias that matched on those URL types and sent them to a separately-configured BFG instance we call the “ghetto”.  This instance is primarily for catalog requests, so we changed the ZEO cache to be *very* high, but we also flush from the cache on each request any object that isn’t in the catalog.  This has been a boon.

I have an interest in looking later at some more options we gain from modwsgi.  For example, file delivery using wsgi.file_wrapper (which modwsgi fixed its implementation bugs in later releases.)  Also, modwsgi has some facilities for setting limits on the life of a request.

Still, regarding just the basics, we haven’t had much of a hitch at all.  So far, so good.

Update on KARL: Migration, enhancements

July 14, 2009

In late May I made a brief blog post announcing KARL prior to the Plone Symposium talk I gave.  Since then, quite a bit has gone on with KARL:

  • The migration went super-smooth.  We did 5 trial runs of gradually larger audiences, fixing bugs in between.  By the time of launch, Chris Rossi and Lars Nolan had the process down pat.  No surprises.
  • We did a KARL 3.1 development cycle in support of hosting KARL for 2 more organizations.  In this process we did a good amount of refactoring, including parameterizing some customization points that previously required overriding via ZCML.
  • Shane added an in-the-core People Directory.
  • Fixed bugs that got overlooked.  This also went very smooth: most bugs got fixed within a day or two, and none piled up for more than a week, even the small ones.
  • During the next two weeks, we are migrating the other two organizations from their existing KARL2 over to their new KARL3.  All hosted in the same environment with OSI, which makes operations far better.
  • We’re also in the 3.2 development cycle.  We updated TinyMCE and landed profile picture resizing (via PIL).  We’re doing spellchecking, improvements to the email alerting, and a good number of other enhancements during the next few weeks.
  • There are also plans for a big KARL meeting at OSI later this month, to include some other organizations.

All in all, we’ve really hit our stride.  Not just on development, but increasingly on operations.  We’re getting into a mode that is very measured, methodical, quality, responsive, etc.  As Jeffrey Shell always said: no alarms, no surprises.

Back from vacation, KARL install docs fixed

July 14, 2009

Got back from 9 days with my wife’s family in France.  Wonderful vacation, especially for the kids (who went before us and stayed after us.)  We both got a head cold just before returning, so the two flights and aftermath were south of pleasant.

While out, longtime-friend Seb Bacon pointed out that the installation docs for KARL had bitrotted, a point which Chris had to cover for me on the latest repozecast.  Now fixed, and sorry!

Fun and profit with middleware

June 3, 2009

Yesterday we were working on the KARL project, doing some post-deployment housekeeping.  Specifically, we had a checkout of the templates that had a local customization (injecting Google Analytics) that we didn’t want to check in.  At least not to the software repository itself.  We wanted building a demo KARL to have no analytics, certainly not OSI’s account.

The most logical thing would have been to throw ZPT at it and get a little snippet of HTML to jam in just before closing the body tag.  But that would mean going to a number of places and injecting calls, plus we’d have to grab the configuration data from somewhere for the right snippet.

Tres and Chris Rossi argued for middleware: something that would watch the outgoing HTML and inject Google Analytics in the appropriate circumstances.  An hour later, Tres had written repoze.urchin that parameterized in the Paste configuration file the data, then hacked the HTML on the way out.

When to use and not use middleware is an art that I’m still learning about.  The biggest two rules appear to be, don’t solve a problem with middleware if the application won’t run without it, or if the middleware requires access to information inside the application.

The marketing value of developer docs

June 1, 2009

Last week at the Plone Symposium East I gave a talk on the KARL project that I’ve been working on.  The basic meme: Plone and its large ecosystem provide a ton of value when your needs match up with its bulls-eye.  What should one do when your needs don’t fit so well into Plone-the-product’s box?

My talking point was, we need to discourage expanding Plone’s bulls-eye to cover generic platform development of any possible application.  Instead, encourage the meme that the technologies (and effort expended learning them) can be used to make a targeted product.

The KARL project adopted that thinking in its switch to BFG (to good effect, as we then focused on building the best KARL we could.)  In describing BFG’s goals, I lifted one directly from Chris:

Documentation: The lack of formal documentation of a feature or API is a bug.

I then went on to explain that Chris released the documentation for BFG before releasing the software, and has made an enormous, constant effort at keeping the wide-ranging docs (API, narrative, example applications) up-to-date as he has refactored.

In making the point, I posited that “Friendly, ample docs make a positive first impression” is part of the reason for swift uptake of Django and other Python web frameworks.  Chris pointed me to a survey that makes that point in spades.

Further confirmation came during the BFG tutorial Tres and I did last week.  Eric Rose clicked the link to the BFG docs in the comprehensive BFG Wiki tutorial and had a very visible positive reaction on his face.

Some info about KARL, the project I’ve been working on

May 21, 2009

For the last few years I’ve been working with some great folks at the Open Society Institute on a project called KARL.  It’s now open source and has a website with some preliminary information, which means I can chat about it in advance of my presentation next week at the Plone Symposium.

In a nutshell, KARL is a collaboration system for projects and organizations.  We are just wrapping up KARL3 (a rewrite to convert from Zope/Plone to Zope-like BFG application) and we’re doing the migration work.    There’s quite a bit to chat about, so look for some more blog posts as we finish up the process.