Sunday, March 29, 2009

Is This Pythonic?

Moshe Zadka and I did an Open Space titled "Is This Pythonic?" where we took someone else's code and reworked it to be cleaner. The code we worked on was cursors.py from the PyShards project.


[Originally Steve Holden and Raymond Hettinger were going to host it (they've done it before) but Steve bowed out and Raymond decided to go downtown with his girl]


Here is the selectOne function in it's original form.


def selectOne(self, sql, args=None):
results = []
shard = self._shard;
while shard != None and len(results) == 0:
db = shard.establishConnection()
cursor = db.cursor()
cursor.execute(sql, args)
res = cursor.fetchone()
if res != None:
results.extend(res)
cursor.close ()
db.close ()
shard = shard.next
return results

The code mixes a bunch of conceptual actions in one big blob. It is walking a linked list* of shards. It acquires a resource (making it harder to test) but doesn't safely release it in a try/finally. It builds up a list of results, and finally returns it. That's a lot of things for one function to be doing at once.


Below was the first cut. Each action is broken into a separate function. Because there are many functions almost like this one we can even reuse those parts.


* The linked list should just be a list, but that's a bigger refactoring.


def valid_shards(shard):
''' walk the shards linked list, yielding the items '''
while shard:
yield shard
shard = shard.next

def apply_all(shards, func):
''' for each shard connect to the database, create a cursor, and pass it to func '''
for shard in shards:
db = shard.establishConnection()
try:
cursor = db.cursor()
yield func(cursor)
finally:
db.close()

def selectOne(self, sql, args):
''' execute sql on each shard, returning the first row (if any) on each shard'''
def fetchone(cursor):
return curser.fetchone(sql, args)

results = apply_all(valid_shards(self._shard), fetchone)
return filter(None, results)

So each function has a little job and does it in a straghtforward way. Because the module has many methods that are almost like selectOne() we should be able to reuse those parts. So we gave it a try on selectMany()

def selectMany(self, sql, args=None, size=None):
results = []
stillToFetch = size
shard = self._shard;
while shard != None and stillToFetch > 0:
db = shard.establishConnection()
cursor = db.cursor()
cursor.execute(sql, args)
res = cursor.fetchmany(stillToFetch)
if res != None:
results.extend(res)
stillToFetch = stillToFetch - len(res)
cursor.close ()
db.close ()
shard = shard.next
return results

SelectMany has an extra wrinkle that SelectOne doesn't in that it will stop early if it gets enough result rows. The apply_all function doesn't have a hook for stopping early so we have to kludge one into the function we pass in. Here is the first draft that has a big code smell. Raising StopIteration will do the right thing but it won't if the implementation changes.

def selectMany(self, sql, args=None, size=None):
limit = [size]
def fetchmany(cursor):
res = cursor.fetchmany(sql, args)
limit[0] = limit[0] - len(res)
if size is not None and limit[0] <= 0:
raise StopIteration

for results in apply_all(valid_shards(self), fetchmany):
for result in results:
yield result

This code would be much cleaner in python2.6, and much much cleaner in 2.7 (the dev trunk). So let's pretend that 'nonlocal' and 'yield from' are available.

def selectMany(self, sql, args=None, size=None):
def fetchmany(cursor):
nonlocal size
res = cursor.fetchmany(sql, args)
size -= len(res)
if size and size <= 0: # our bug just became more obvious!
raise StopIteration

for results in filter(None, apply_all(valid_shards(self), fetchmany)): # bug fixed!
yield from results

[I fixed the missing filter bug in that one too]
Let's fix that size bug and raise a specific exception so our code is safe even if the implementation of apply_all changes.

def selectMany(self, sql, args=None, size=None):
class LimitReached(Exception): pass
def fetchmany(cursor):
nonlocal size
if size is not None and size <= 0: # bug fixed!
raise LimitReached
res = cursor.fetchmany(sql, args)
size -= len(res)

try:
for results in filter(None, apply_all(valid_shards(self), fetchmany)):
yield from results
except LimitReached:
pass

Yuck. That might more correct but now the code smell is stinking up the room. What we need to do is stuff more smarts into apply_all().
To Be Continued...

No comments: