Archive for the ‘Uncategorized’ Category

Faster relevance ranking didn’t make it into PostgreSQL 9.4

October 29, 2014

Alas, the one big feature we really needed, the patch apparently got rejected.

PostgreSQL has a nice little full-text search story, especially when you combine it with other parts of our story (security-aware filtering of results, transactional integrity, etc.) Searches are very, very fast.

However, the next step — ranking the results — isn’t so fast. It requires a table scan (likely to TOAST files, meaning read a file and gunzip its contents) on every row that matched.

In our case, we’re doing prefix searches, and lots and lots of rows match. Lots. And the performance is, well, horrible. Oleg and friends had a super-fast speedup for this ready for PostgreSQL 9.4, but it apparently got rejected.

So we’re stuck. It’s too big a transition to switch to ElasticSearch or something. The customer probably should bail on prefix searching (autocomplete) but they won’t. We have an idea for doing this the right way (convert prefixes to candidate full words, as Google does, using PG’s built-in lexeme tools) but that is also too much for budget. Finally, we don’t have the option to throw SSDs at it.

Recent work with RelStorage

October 5, 2014

The KARL project has been focused in the last year on some performance and scalability issues. It’s a reasonably big database, ZODB-atop-RelStorage-atop-PostgreSQL. It’s also heavily security-centric with decent writes, so CDNs and other page caching wasn’t going to help.

I personally re-learned the ZODB lesson that the objects needed for your view better be in memory. Our hoster is 32-bit-only, so that made us learn a little more and think about the tradeoffs. Adding more threads means higher memory usage per process. We get some concurrency as PG requests release the GIL, but that’s hard to bank on. Instead you have a bunch of single-connection processes, but then you get little cache affinity. One bad view leads to 20k objects getting requested. The user hits reload a few times, and your site is hung until PG is finished.

We use memcache as well, but that is also limited to 2 GB. We likely need to spread the cache across a few processes.

We decided to do some timing of various scenarios, with the hardware and dataset that we had. A completely cold PG server (nothing in OS read buffers), a warm PG, stuff in memcache, stuff in the RelStorage local cache, and stuff in the ZODB connection cache. Here is what Shane found:

  • 179 seconds when PG has to fetch a lot of the data from disk.
  • 21 seconds when PG has all the data in its own cache.
  • 6 seconds when memcache has all the pickles.
  • 2 seconds when the local client cache has all the pickles.
  • 1.2 seconds when the ZODB cache is filled.

We had previously thought that unpickling was a bottleneck.  It wasn’t. We then did some research on Python standard library compression at the lowest level. Turns out that we can get decent compression at very high speeds. So we thought about RelStorage’s off-overlooked “local cache”, an in-memory, process-wide pickle cache. By default it is set to a low number.

The local cache had an enticing aspect: the code was tinkerable, as it was under Shane’s control. What if we could play with some ideas? For example, only cache objects under a certain size. Some big PDF might take up the space of thousands of (far more important) catalog objects. RelStorage gained a knob for setting a size threshold. Compression, though, is very interesting. With it we can get many more objects in the cache, and not pay too much of a price.

Both of these (size limit and compression level) are now in RelStorage. It will take a while to decipher the right combination of ZODB connections vs. client cache vs. local cache, and which numbers to up/down before hitting 2 GB range. But we’ve already had a big impact on performance.

And a plug for some other work that the KARL project funded Shane for….packing improvements that went into b2 a couple of days ago, plus perfmetrics decorators that let you spew all kinds of ZODB-oriented stats to graphite/DataDog.

Plone runs Brazil

September 28, 2013

Plone, baby, you still got the power to amaze.

I saw this tweet by André Nogueira (@agnogueira):

And it’s true. The President of Brazil announced the web portal, and at the bottom, you see that it is powered by Plone. I’m at National Airport, on the way to Atlanta, on the way to Brasilia for the conference. And that’s a nice thing to see.

Me on the web, 20 years ago today

September 2, 2013

September 2, 1993 is my first recorded post about the web, asking a question answered by Marc Andreesen.

As backstory, I was a Navy officer waiting around to start flight school. I was part of the “Top Gun” generation: in 1986, lots of silly young men go watch Top Gun, then four years later the Navy has an incredible surplus of pilots. Great foresight there, Navy. For over a year I got paid to go shark fishing in Pensacola, which really was as awesome as it sounds. Thanks, taxpayers!

Ultimately my eyesight went over the allowed limit before my start date arrived, so I had to find something else to do in early 1992. As it turns out, Pensacola was home to the “Navy Internet Manager”. I transferred over to that group with a task of getting some valuable services onto that crazy Internet thing. Email, telnet, DNS, etc. Then gopher. Then the big argument. gopher+ versus this new WWW thing.

The guy in the cubicle next to me ran the navy.mil DNS domain. I vividly remember, in early 1993, rolling my chair around the partition wall and asking for http://www.navy.mil. “What’s that?” he asked. Hilarity ensues. I’ll talk about my experience with that in another post next month.

New blog posts: Chris, Tres, me

August 12, 2013

Last week, Tres and I wrapped up a little website for Agendaless, based on Pyramid and Substance D. Not very ambitious site, but still a lot of fun working in Substance D.

We put up some blog posts:

  • Chris wrote a really thought-provoking article about our experiences when open source and consulting rub up against the sad state of patent shakedowns. Someone like Chris writes and gives away a multi-million-dollar code base. When it is time to do some consulting, he’s asked to indemnify the small, custom against patents as part of the contract.  The yearly insurance cost is over half his yearly income.
  • Agendaless is turning 7 in a couple of weeks. Tres did a little retrospective.
  • 11 years ago I moved to France, with a focus soon thereafter on large-scale project management.

PyCharm 3.0 EAP2 supports Pyramid

August 2, 2013

Confession: I’m an IDE user. Years of Emacs with occasional flirtations with IDEs (Komodo, PyDev) always led me back to Emacs. PyCharm (with the requisite 4 GB RAM upgrade for the Java tax) is the one that stuck.

PyCharm has never had any specific Pyramid support, until now. The upcoming 3.0 version will have some support for Pyramid, and the latest early access preview makes this available. Here is a screencast I just made demonstrating the support:

Screencast showing Pyramid support in PyCharm 3.0 EAP

In this demo I show:

  • Creating Project using “Project type” of Pyramid
  • Making a virtual environment under Python 3.3.2 (using pyvenv) for the new project
  • Telling PyCharm to install Pyramid into the project
  • Choosing one of the Pyramid-provided “scaffolds” to generate a working sample
  • Using the PyCharm-generated “Run Configuration” to easily execute the generated setup.py
  • Using the PyCharm-generated “Run Configuration” to start the Pyramid project and view the home page in a browser
  • I also show a reminder to bring in stuff for the office omelets (my 3 chickens are laying eggs faster than we can eat them)

Two caveats I covered:

  • There is a bug, now fixed, in the run configuration about working directory. The next EAP should include the fix.
  • If you want the scaffold to generate a sample, it is important to click “No” when PyCharm warns you about an existing directory. (This only happens if, like I do, you use virtual environments stored in the project directory.)

I don’t know how much other support JetBrains plans for Pyramid in PyCharm. I doubt they’ll get Chameleon support, for example. But for those trying to get started quickly, this really eliminates a lot of monkey business related to virtual environments, getting easy_install/pip into the virtualenv, getting the sources and running a scaffold, etc.

Video and source for my PyCon Pyramid tutorial with a twist of Python 3

April 8, 2013

(Updated: forgot some links.)

PyCon was a whole kettle of fish this year, happy bouncy fish that drive in RVs and stay up really late.

I gave a Getting Started with Pyramid tutorial at PyCon. The fun twist: I based it on Python 3, which Pyramid has supported for about a year and a half. I have a feeling that this might have been the first PyCon web tutorial that targeted Python 3. (I also supported Python 2, which about half the attendees used.)

Video here and source docs/code here.

To tell the truth, everything went surprisingly smooth on the Python 3 side. My first real headache wasn’t Python’s fault. Homebrew’s pyvenv command won’t let you install distribute into the virtual environment (and unfortunately, despite being told it works fine with a source Python, they simply closed the ticket but re-opened in another ticket.)

My only other issue was process-related. I used Sphinx for my tutorial material. I decided to use Python 3 for the Sphinx part as well. The released version wasn’t respecting my build pickles (even after I deleted them), but an install from Sphinx master solved the problem.

All in all, quite cool. I used PyCharm to the fullest extent possible for all the coding and Sphinx docs, and PyCharm hung along with Python 3 quite well.

So the open question…is 2013 going to be a year that Python 3 starts to emerge?

Python 3 for my PyCon Pyramid tutorial

February 18, 2013

Going to PyCon? Looking for a good tutorial to sign up for? Interested in Python 3?

I’m giving a Pyramid for Humans tutorial at PyCon, Wednesday morning. It’s going to be a lot of fun. As extra spice, I’m going to give a try at targeting the tutorial at Python 3, which Pyramid has supported for almost a year and half.

If you’re looking for a rich web framework and you think 2013 is a good year to start kicking the Python 3 tires, come hang out with us.

I’m doing a Pyramid-Python3 talk at DC Python Feb 5

January 18, 2013

We’re getting out-and-about more in 2013, talking up Pyramid and whatnot. Starting with, a Pyramid presentation at the DC Python meetup on Tuesday, February 5th in DC, 7PM at Browsermedia/NClud, 19th street. Only 8 RSVP spots left!

I’ll give a little warmup then a lightening tour of the “Pyramid for Human” tutorial I’m giving at PyCon in March.

And….I plan to give the tutorial using Python 3, just to emphasize that Python web frameworks are making Python 3 progress in 2013, and Pyramid has had production support for well over a year.

Emphasize Python 3 in Pyramid docs?

January 13, 2013

Recently I saw that Django 1.5 will have experimental (non-production) support for Python 3, and just saw today about Twisted supporting some Python 3. Someone recently asserted that 2013 might be the year that Python 3 tips.

Pyramid has fully supported Python 3 for well over a year. I’m hoping to work on a “Getting Started” section in the Pyramid docs. What do people think about ways to highlight Python 3 a little more visibly, in the Pyramid docs and elsewhere?

My supposition: people who are interested in real web stuff with Python 3 should give Pyramid a try, and thus, we should reach out to them.


Follow

Get every new post delivered to your Inbox.

Join 563 other followers