» Publishers, Monetize your RSS feeds with FeedShow: More infos (Show/Hide Ads)
A lot has been written lately about Google Maps Navigation. Google is basically giving away an incredible mapping application with good mapping data for free. Why would they do such a thing? Most of the guesses I've seen basically say "they like to give stuff away for free to push more advertisements". That's close, but everyone seems to have missed a huge detail, perhaps the most important detail of all.
Google is an advertisement company, particularly skilled at targeted advertisements. Almost all of their revenue comes from being able to show you ads that you want to see when you want to see them. What does this have to do with maps and navigation? Well, this is going to seem really obvious once you read it, but no one seems to have mentioned it yet, so here it goes:
Google will know everywhere you drive, and when.
Do you drive past particular stores to and from work every day? They'll know.
Do you often search for "in-and-out" while driving?. They'll know.
Do you email your friends (using gmail) about how much you love Brand X jeans and are driving past a Brand X outlet mall? They'll know.
Did you recently search (on Google) for Computer Part Y and there's a sale for it at a store on your way to work? They'll know.
Are you driving around and stopping at foreclosed houses? They'll know.
Do you drive the same route to work at the same time of day as 12 other people? Then try the new Google Carpool!
I made that last one up. But you get the idea. The more Google knows about you, the better the advertisements they can serve you. And knowing everywhere you drive and when and what you are looking for is a lot of very powerful data. I'm sure the big brains in Mountain View are salivating over the data mining possiblities that they'll probably have very soon. And once you start using it, you'll probably start getting very well targetted advertisements because they'll know so much about you.
Whether it's a good thing or not that they know so much about you is a topic of another day. For now, let us all at least realize one overlooked purpose of Google Maps Navigation: more data about you for better advertisements.
- Mutable data structures are a bane to concurrent (multi-threaded) code.
- Writing self.foo = foo, self.bar = bar, etc, is a huge waste of time.
- When you have lots of data structures in memory, tuple-based data uses 1/4 the memory of class-based data.
So, go to the github repo or use it by following the really easy steps:
- Download Record.py from http://github.com/pthatcher/pyrec/blob/master/Record.py
- put from Record import Record at the top of your code.
- make a class by saying something like class Person(Record("name", "age"))
- Never write __init__ again (unless you want mutability).
Update: A commenter (thanks Dan!) pointed out that this is a lot like namedtuple, added in python 2.6. He asked why use this instead of namedtuple. Well, I have to admit that I probably would have never created pyrec if namedtuple existed 3 years ago. I try to avoid NIH syndrome. But it didn't exist, so I wrote pyrec. But I've been using pyrec for three years, so I have some experience on some little things that make a big difference (to me). Here are a few advantages pyrec has over namedtuple:
- It has a nicer interface. I prefer new(val1, val2) to _make([val1, val2]), alter to _update, and class Person(Record("name", "age")) to Person = namedtuple("Person", "name, age")
- I added the setField methods. That's what I use 90% of the time. Only about 10% of the time do I use alter. setField is a lot more convenient.
- With pyrec, you can safely override __iter__ and __getitem__. For example, in Record.py, you'll see the implementation of a LinkedList. I tried doing that with namedtuple, but the overidden __getitem__ clobers the name lookup and __iter__ the tuple unpacking. You can use tuple.__iter__(rec) to get around the latter, but pyrec's .values is a lot nicer.
- pyrec has .namedValues for ordered (field, value) pairs, unlike _asdict() which throws out the order. For many things I use pyrec for, this matters.
- You can improve it! Have looked at the code for namedtuple? Ugly. This is pretty clean, so you can improve it very easily if you need additional functionality which will work with all of your records.
This could be a big deal. Why? It means that you can do anything in an SQL query. While drawing the mandlebrot set is a nice demo, the first "killer app" will be tree traversal. Lots of relational tables end up looking like trees and are a royal pain to deal with in normal SQL. After that, only our imagination (and performance) is the limit. You can theoretically create any data view that runs in the DB all in one shot, without any of the troubles of stored procedures.
I see a similarity between turing-complete SQL in the DB and JavaScript on the browser. For years, we used JavaScript to do silly stuff, but then a few bright minds created amazing tools like gmail and google maps and our view of JavaScript was forever changed. Now JavaScript is everywhere and there's a race to make the best development framework and engines fastest engines.
Might the same thing happen with turning-complete SQL? is the race on to be the first programming language to compile down to turing-complete SQL?.
Honestly, if I were a betting man, I'd say it won't come to anything significant. But I'll also guess that the lure of compiling down to SQL will eventually capture someone, somewhere. It's only a matter of time. In fact, if you're a programming language geek, the sirens are probably calling to you right now :). When it does happen, I expect the first language to be a functional one, especially one with a small core, like a variant of lisp. Once we get clojure in clojure, maybe it will be a candidate. Or maybe C# will end up with LINQ-to-turning-complete-SQL (SQLINQ?).
What do you think?
import time
## simplified event stuff
class Event:
def __init__(self):
self.handlers = []
def handle(self, handler):
self.handlers.append(handler)
return self #so += will work
def fire(self, val = None):
for handler in self.handlers:
handler(val)
def echo(val):
print val
return val
def simple_click_event_example():
click = Event()
click.handle(echo)
click.fire("left") #prints "left"
def click_event_manipulation_example():
def left_double_click_from_click(click, threshold):
dlclick = Event()
last_lclick_time = [0] #closure hack
def click_handler(click_value):
if click_value == "left":
lclick_time = time.time()
if (lclick_time - last_lclick_time[0]) < threshold:
dlclick.fire("double left")
last_lclick_time[0] = lclick_time
click.handle(click_handler)
return dlclick
click = Event()
dlclick = left_double_click_from_click(click, .01)
dlclick.handle(echo)
click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double left"
class EventFireRecord:
def __init__(self, value, time):
self.value = value
self.time = time
def click_event_maniuplation_refactored_example():
def doubleize_event(evt, threshold, combine):
double_evt = Event()
last_fire = EventFireRecord(None, 0)
def evt_handler(value):
fire_time = time.time()
if (fire_time - last_fire.time) < threshold:
double_evt.fire(combine(last_fire.value, value))
last_fire.__init__(value, fire_time)
evt.handle(evt_handler)
return double_evt
def filter_event(evt, predicate):
filtered_evt = Event()
def evt_handler(value):
if predicate(value):
filtered_evt.fire(value)
evt.handle(evt_handler)
return filtered_evt
click = Event()
lclick = filter_event(click, lambda value : value == "left")
dlclick = doubleize_event(lclick, .01, lambda click1, click2 : "double left")
dlclick.handle(echo)
click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double click"
def click_event_handle_with_result_example():
def handle_with_result(evt, handler_with_result):
evt_out = Event()
def handler(value):
result = handler_with_result(value)
if result is not None:
evt_out.fire(result)
evt.handle(handler)
return evt_out
def doubleize_r(evt, threshold):
last_fire = EventFireRecord(None, 0)
def handler(value):
fire_time = time.time()
try:
if (fire_time - last_fire.time) < threshold:
return (last_fire.value, value)
finally:
last_fire.__init__(value, fire_time)
return handle_with_result(evt, handler)
def filter_r(evt, predicate):
def handler(value):
if predicate(value):
return value
return handle_with_result(evt, handler)
clicks = Event()
dlclicks = doubleize_r(filter_r(click, lambda value : value == "left"), .01)
dlclicks.handle(echo)
clicks.fire("left")
time.sleep(.02)
clicks.fire("left")
clicks.fire("right")
clicks.fire("left") #prints ("left", "left")
def click_event_choosing_by_returning_event_example():
def handle_with_result(evt, handler_with_result):
evt_out = Event()
def handler(value):
result = handler_with_result(value)
if result is None:
pass #ignore
elif isinstance(result, Event):
result.handle(evt_out.fire)
elif isinstance(result, list):
for value_out in result:
evt_out.fire(value_out)
else:
evt_out.fire(result)
evt.handle(handler)
return evt_out
def filter_r(evt, predicate):
def handler(value):
if predicate(value):
return value
return handle_with_result(evt, handler)
def value_filter_r(evt, value):
return filter_r(evt, lambda val : val == value)
def click_choose_r(keys, clicks):
def key_handler(char):
#TODO: unsubscribe from event after either "l" or "r"
if char == "l":
return value_filter_r(clicks, "left")
elif char == "r":
return value_filter_r(clicks, "right")
elif char == "f":
return ["fake", "fake"]
return handle_with_result(keys, key_handler)
keys = Event()
clicks = Event()
choosen_clicks = click_choose_r(keys, clicks)
def click_event_looks_like_streams_example():
class Event:
def __init__(self):
self.handlers = []
def handle(self, handler):
self.handlers.append(handler)
return self #so += will work
def fire(self, val = None):
for handler in self.handlers:
handler(val)
def bind(evt, handler_with_result):
evt_out = Event()
def handler(value):
result = handler_with_result(value)
if result is not None:
Event.unit(result).handle(evt_out.fire)
evt.handle(handler)
return evt_out
@classmethod
def unit(cls, val):
if isinstance(val, cls):
return val
elif isinstance(val, list):
return MockEvent(*val)
else:
return MockEvent(val)
__rshift__ = bind
class MockEvent:
def __init__(self, *vals):
self.vals = vals
def handle(self, handler):
for val in self.vals:
handler(val)
def doublize_r(threshold, combine):
last_fire = EventFireRecord(None, 0)
def handler(value):
fire_time = time.time()
try:
if (fire_time - last_fire.time) < threshold:
return combine(last_fire.value, value)
finally:
last_fire.__init__(value, fire_time)
return handler
def filter_r(predicate):
def handler(value):
if predicate(value):
return value
return handler
def value_filter_r(value):
return filter_r(lambda val : val == value)
def click_choose_r(**clicks_by_char):
def key_handler(char):
return clicks_by_char.get(char)
return key_handler
clicks = Event()
keys = Event()
dlclicks = clicks >> value_filter_r("left") >> doublize_r(.01, lambda l1, l2: "double left")
keys >> click_choose_r(d = dlclicks, f = ["fake", "fake"]) >> echo
clicks.fire("left")
clicks.fire("left")
keys.fire("f") #prints "fake" and then "fake" again
keys.fire("d")
clicks.fire("right")
clicks.fire("right")
time.sleep(.02)
clicks.fire("left")
clicks.fire("left") #print ("double left")
## basic consumer (receiver) using generators
receive = object()
def receiver_example():
def receiver(gen_rcvr):
def gen_and_start_rcvr(*args, **kargs):
rcvr = gen_rcvr(*args, **kargs)
rcvr.send(None)
return rcvr
return gen_and_start_rcvr
@receiver
def sum_r(title):
total = 0
while True:
total += yield receive
print "%s: %d" % (title, total)
@receiver
def count_r(title):
count = 0
while True:
yield receive
count += 1
print "%s: %d" % (title, count)
num_key = Event()
sum_nums = sum_r("total nums")
num_key.handle(sum_nums.send)
num_key.fire(1) #prints "total nums: 1"
num_key.fire(2) #prints "total nums: 3"
num_key.fire(3) #prints "total nums: 6"
## make retiterators that can also output values via an event fire
def remitter_example():
class Remitter:
def __init__(self, receiver_from_event_out):
self.receiverFromEventOut = receiver_from_event_out
def __rrshift__(self, event_in):
event_out = Event()
rcvr = self.receiverFromEventOut(event_out)
event_in.handle(rcvr.send)
return event_out
def remitter(gen_rcvr):
def gen_remitter(*args, **kargs):
def receiver_from_event_out(event_out):
rcvr = gen_rcvr(event_out, *args, **kargs)
rcvr.send(None)
return rcvr
return Remitter(receiver_from_event_out)
return gen_remitter
@remitter
def double_detect_r(double_click_event, threshold):
last_click_time = 0
while True:
yield receive
current_click_time = time.time()
if (current_click_time - last_click_time) < threshold:
double_click_event.fire()
last_click_time = current_click_time
@remitter
def print_r(_, message):
while True:
val = yield receive
print message
mouse_click = Event()
mouse_click >> print_r("left")
mouse_click >> double_detect_r(.01) >> print_r("double left")
mouse_click.fire() #prints "left"
time.sleep(.02)
mouse_click.fire() #prints "left"
mouse_click.fire() #prints "left" and "double left"
## make retiterators out of generators that can send and receive
def remitter_example_yield_out():
from collections import defaultdict
class Remitter:
def __init__(self, ritr):
self.ritr = ritr
self.eventOut = Event()
def send(self, val_in):
ritr = self.ritr
event_out = self.eventOut
while True:
val_out = ritr.send(val_in)
if val_out is receive:
break
else:
event_out.fire(val_out)
def handle(self, handler):
self.eventOut.handle(handler)
def handlein(self, *events):
for event in events:
event.handle(self.send)
def __rrshift__(self, event_in):
try:
self.handlein(*event_in)
except:
self.handlein(event_in)
return self
def remitter(gen_rcvr):
def gen_remitter(*args, **kargs):
ritr = gen_rcvr(*args, **kargs)
ritr.send(None)
return Remitter(ritr)
return gen_remitter
@remitter
def double_detect_r(threshold):
last_click_time = 0
while True:
yield receive
current_click_time = time.time()
if (current_click_time - last_click_time) < threshold:
yield
last_click_time = current_click_time
@remitter
def map_r(f, *args, **kargs):
while True:
val = yield receive
yield f(val, *args, **kargs)
@remitter
def print_r(format):
while True:
val = yield receive
print message % val
def label_r(label):
return map_r(lambda val : (label, val))
@remitter
def label_count_r():
count_by_label = defaultdict(int)
while True:
(label, val) = yield receive
count_by_label[label] += 1
yield count_by_label.copy()
def fix_click_counts(count_by_label, single_label, double_label):
count_by_label[single_label] -= (count_by_label[double_label] * 2) #every double left "cancels" a single click
return count_by_label.copy()
def print_label_counts(count_by_label, *labels):
print ", ".join("%d %s" % (count, label) for (label, count) in count_by_label.iteritems())
mouse_clicks = Event()
([mouse_clicks >> label_r("single"),
mouse_clicks >> double_detect_r(.01) >> label_r("double")]
>> label_count_r() >> map_r(fix_click_counts, "single", "double") >> map_r(print_label_counts))
#prints
#0 double, 1 single
#0 double, 2 single
#0 double, 3 single
#1 double, 1 single
mouse_clicks.fire()
time.sleep(.02)
mouse_clicks.fire()
mouse_clicks.fire()
def remitter_without_yield_in_hack_example():
class Receive:
def __init__(self, val = None):
self.d = val
class Remitter:
def __init__(self, receive, ritr):
self.receive = receive
self.ritr = ritr
self.eventOut = Event()
def send(self, val_in):
self.receive.d = val_in
ritr = self.ritr
event_out = self.eventOut
while True:
val_out = ritr.next()
if isinstance(val_out, Receive):
break
else:
event_out.fire(val_out)
def handle(self, handler):
self.eventOut.handle(handler)
def handlein(self, *events):
for event in events:
event.handle(self.send)
def __rrshift__(self, event_in):
try:
self.handlein(*event_in)
except:
self.handlein(event_in)
return self
def remitter(gen_rcvr):
def gen_remitter(*args, **kargs):
receive = Receive()
ritr = gen_rcvr(receive, *args, **kargs)
ritr.send(None)
return Remitter(receive, ritr)
return gen_remitter
@remitter
def double_detect_r(receive, threshold):
last_click_time = 0
while True:
yield receive
current_click_time = time.time()
gap = current_click_time - last_click_time
if gap < threshold:
yield gap
last_click_time = current_click_time
@remitter
def average_r(receive):
total = 0.0
count = 0
while True:
yield receive
total += receive.d
count += 1
yield total/count
@remitter
def print_r(receive, format):
while True:
yield receive
print format % (receive.d)
mouse_clicks = Event()
mouse_clicks >> double_detect_r(.05) >> average_r() >> print_r("double left; average gap is %s seconds")
mouse_clicks.fire()
time.sleep(.1)
mouse_clicks.fire()
time.sleep(.01)
mouse_clicks.fire() #prints #double left; average gap is 0.01... seconds
time.sleep(.02)
mouse_clicks.fire() #double left; average gap is 0.015... seconds
if __name__ == "__main__":
#simple_click_event_example()
#click_event_manipulation_example()
#click_event_maniuplation_refactored_example()
#click_event_handle_with_result_example()
#click_event_choosing_by_returning_event_example()
#click_event_looks_like_streams_example()
#remitter_example()
#remitter_example_yield_out()
remitter_without_yield_in_hack_example()
Let's explore some other ways to do Reactive Programming (Rx).
What is Rx?
Just to remind you what we're trying to accomplish, Rx builds event handlers. The LINQ version of Rx works by making an event look like a query (or stream).It makes sense if you think about it. An event is a stream, of event occurences. A list or enumerator or iterator is also a stream, of values. So if you squint hard enough, you see that events and enumerators are sort of the same thing. In fact, lists, streams, enumerators, events, channels, pipes, sockets, file handles, actors sending you messages are all pretty much the same thing: they are all producers of values.
Consumers and Receivers
Now what do you do with producers of values? You consume them, of course! Usually with something that looks like this (in python):sum = 0 for val in vals: sum += val print sumWhat we've created here is a consumer of vals. We can write it this way, as ordinary code, because vals is very flexible: it's anything that's iterable/enumerable. But what if instead of forcing the producer to be flexible, we forced the consumer to be flexible? What if we could write the consumer like this:
total = 0 while True: total += receive print totalHmm... it sort of looks like the opposite of an iterator/generator/enumerator block. A mathematician might say something about "duals" at this point, but I'm not mathematician, so let's just go ahead and try and implement it. In fact, we'll use python generators to do just that. We'll call this a "receiver" and we'll spell "receive" as "yield receive":
class Event:
def __init__(self):
self.handlers = []
def handle(self, handler):
self.handlers.append(handler)
def fire(self, val = None):
for handler in self.handlers:
handler(val)
receive = object()
def receiver(gen_rcvr):
def gen_and_start_rcvr(*args, **kargs):
rcvr = gen_rcvr(*args, **kargs)
rcvr.send(None)
return rcvr
return gen_and_start_rcvr
@receiver
def sum_r(title):
total = 0
while True:
total += yield receive
print "%s: %d" % (title, total)
@receiver
def count_r(title):
count = 0
while True:
yield receive
count += 1
print "%s: %d" % (title, count)
num_key = Event()
sum_nums = sum_r("total nums")
num_key.handle(sum_nums.send)
num_key.fire(1) #prints "total nums: 1"
num_key.fire(2) #prints "total nums: 3"
num_key.fire(3) #prints "total nums: 6"
It actually works! And because our consumer is very flexible, any producer, like an event, can use it. In fact, it's just a fancy event callback, just like everyrthing else in Rx land.
Remitters
But if we take this one step further and make a receiver wrap an event, we can make a receiver that's also a producer. We'll call it a "remitter", which is sort of like a receiver and an emitter.
class Remitter:
def __init__(self, receiver_from_event_out):
self.receiverFromEventOut = receiver_from_event_out
def __rrshift__(self, event_in):
event_out = Event()
rcvr = self.receiverFromEventOut(event_out)
event_in.handle(rcvr.send)
return event_out
def remitter(gen_rcvr):
def gen_remitter(*args, **kargs):
def receiver_from_event_out(event_out):
rcvr = gen_rcvr(event_out, *args, **kargs)
rcvr.send(None)
return rcvr
return Remitter(receiver_from_event_out)
return gen_remitter
@remitter
def double_detect_r(double_click_event, threshold):
last_click_time = 0
while True:
(yield)
current_click_time = time.time()
if (current_click_time - last_click_time) < threshold:
double_click_event.fire()
last_click_time = current_click_time
@remitter
def print_r(_, message):
while True:
val = (yield)
print message
mouse_click = Event()
mouse_click >> print_r("left")
mouse_click >> double_detect_r(.01) >> print_r("double left")
mouse_click.fire() #prints "left"
time.sleep(.02)
mouse_click.fire() #prints "left"
mouse_click.fire() #prints "left" and "double left"
Great. But it is kind of annoying passing in the event like that. What if we had the remitter yield values out and yield values in?
Remitters that yield out and in
We could do that using little state machines built from python generators. "yield receive" will mean receive and "yield" of anything else will mean "emit".
from collections import defaultdict
class Remitter:
def __init__(self, ritr):
self.ritr = ritr
self.eventOut = Event()
def send(self, val_in):
ritr = self.ritr
event_out = self.eventOut
while True:
val_out = ritr.send(val_in)
if val_out is receive:
break
else:
event_out.fire(val_out)
def handle(self, handler):
self.eventOut.handle(handler)
def handlein(self, *events):
for event in events:
event.handle(self.send)
def __rrshift__(self, event_in):
try:
self.handlein(*event_in)
except:
self.handlein(event_in)
return self
def remitter(gen_rcvr):
def gen_remitter(*args, **kargs):
ritr = gen_rcvr(*args, **kargs)
ritr.send(None)
return Remitter(ritr)
return gen_remitter
@remitter
def double_detect_r(threshold):
last_click_time = 0
while True:
yield receive
current_click_time = time.time()
if (current_click_time - last_click_time) < threshold:
yield
last_click_time = current_click_time
@remitter
def map_r(f, *args, **kargs):
while True:
val = yield receive
yield f(val, *args, **kargs)
@remitter
def print_r(format):
while True:
val = yield receive
print message % val
def label_r(label):
return map_r(lambda val : (label, val))
@remitter
def label_count_r():
count_by_label = defaultdict(int)
while True:
(label, val) = yield receive
count_by_label[label] += 1
yield count_by_label.copy()
def fix_click_counts(count_by_label, single_label, double_label):
count_by_label[single_label] -= (count_by_label[double_label] * 2) #every double click "cancels" a single click
return count_by_label.copy()
def print_label_counts(count_by_label, *labels):
print ", ".join("%d %s" % (count, label) for (label, count) in count_by_label.iteritems())
mouse_clicks = Event()
([mouse_clicks >> label_r("single"),
mouse_clicks >> double_detect_r(.01) >> label_r("double")]
>> label_count_r() >> map_r(fix_click_counts, "single", "double") >> map_r(print_label_counts))
#prints
#0 double, 1 single
#0 double, 2 single
#0 double, 3 single
#1 double, 1 single
mouse_clicks.fire()
time.sleep(.02)
mouse_clicks.fire()
mouse_clicks.fire()
Sweet. That looks pretty nice. But, it relies on the fact that Python allows you to yield values in to a generator. What if we have a programming language that only allows yielding values out (like any enumerator)?
Remitters that yield in by yielding out
We'll introduce a simple hack to work around that. We'll yield out a mutable "receive" value that "receives" in the value for us.
class Receive:
def __init__(self, val = None):
self.d = val
class Remitter:
def __init__(self, receive, ritr):
self.receive = receive
self.ritr = ritr
self.eventOut = Event()
def send(self, val_in):
self.receive.d = val_in
ritr = self.ritr
event_out = self.eventOut
while True:
val_out = ritr.next()
if isinstance(val_out, Receive):
break
else:
event_out.fire(val_out)
def handle(self, handler):
self.eventOut.handle(handler)
def handlein(self, *events):
for event in events:
event.handle(self.send)
def __rrshift__(self, event_in):
try:
self.handlein(*event_in)
except:
self.handlein(event_in)
return self
def remitter(gen_rcvr):
def gen_remitter(*args, **kargs):
receive = Receive()
ritr = gen_rcvr(receive, *args, **kargs)
ritr.send(None)
return Remitter(receive, ritr)
return gen_remitter
@remitter
def double_detect_r(receive, threshold):
last_click_time = 0
while True:
yield receive
current_click_time = time.time()
gap = current_click_time - last_click_time
if gap < threshold:
yield gap
last_click_time = current_click_time
@remitter
def average_r(receive):
total = 0.0
count = 0
while True:
yield receive
total += receive.d
count += 1
yield total/count
@remitter
def print_r(receive, format):
while True:
yield receive
print format % (receive.d)
mouse_clicks = Event()
mouse_clicks >> double_detect_r(.05) >> average_r() >> print_r("double click; average gap is %s seconds")
mouse_clicks.fire()
time.sleep(.1)
mouse_clicks.fire()
time.sleep(.01)
mouse_clicks.fire() #prints #double click; average gap is 0.01... seconds
time.sleep(.02)
mouse_clicks.fire() #double click; average gap is 0.015... seconds
It works! And it should work in any language with iterator blocks. You could even use this C# instead of using LINQ Rx, but then you'll have to type "yield return receive" :(.
Conclusion
Rx is all about making flexible consumers of values, which basically amounts to making event callbacks. LINQ Rx does so with monads, but that's not the only way. Here, we have shown how we can turn a generator or iterator block inside out and make it consume values rather than produce values. Using these is an alternative to LINQ Rx that might be more appropriate for your programming language. There are lots of other things to work out, like unhandling an event, error handling, catching the end of a stream, etc. But this is pretty good, simple foundation to show that the essense of reactive programming is making it easy to make flexible value consumers (basically event handlers). In both the case of Rx, and the code above, we've done so by making a little DSL in the host language.Next time...
There are still other ways of making flexible consumers. If we had continuations or used CPS we could just use the current continuation as our consumer. So, scheme and Ruby ought to have easy solutions to this problem. We can do a similar thing with macros in any Lisp language that doesn't have continuations, like Clojure. In fact, I'd like to explore how to do Rx in clojure next time. And at some point, we need to figure out how concurrency fits into all of this.P.S.
While I was researching all of this stuff, I was surprised to find that my friend, Wes Dyer, is right at the heart of it. You can see a video of him here. That was a surprise because I've never talked with him about this. In fact, I've only heard from him once in the last year because he's "gone dark" . I'm sure his work on Rx has something to do with that :). I just want to make it clear that all of my thoughts are mine alone, and not based on any communication with him. Don't blame him for my crazy stuff :).If you like things like "monads", "duals", and category theory, go watch this video, especially until the end. It's pretty funny.
But if those things make your eyes glaze over and you're left wondering what Rx really is, I want to give you a simple explanation of what Rx is all about. In fact, I'll show how you could have invented it yourself. We'll do so with simple event-based code written in Python.
Step 1: write simple event handlers
Imagine we have a mouse click event that fires either "left" or "right", and we want to make a new event that fires "double left" when there's a double left click. We might write something like this (including a simple Event class).
import time
class Event:
def __init__(self):
self.handlers = []
def handle(self, handler):
self.handlers.append(handler)
def fire(self, val = None):
for handler in self.handlers:
handler(val)
def echo(val):
print val
return val
def left_double_click_from_click(click, threshold):
dlclick = Event()
last_lclick_time = [0] #closure hack
def click_handler(click_value):
if click_value == "left":
lclick_time = time.time()
if (lclick_time - last_lclick_time[0]) < threshold:
dlclick.fire("double left")
last_lclick_time[0] = lclick_time
click.handle(click_handler)
return dlclick
click = Event()
dlclick = left_double_click_from_click(click, .01)
dlclick.handle(echo)
click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double click"
Step 2: refactor event handlers
It works and it's pretty simple. But, we could refactor quite a bit. If we do so, we might write something like this (notice I like the suffix "_r" for "reactive"):
class EventFireRecord:
def __init__(self, value, time):
self.value = value
self.time = time
def doubleize_r(evt, threshold, combine):
double_evt = Event()
last_fire = EventFireRecord(None, 0)
def evt_handler(value):
fire_time = time.time()
if (fire_time - last_fire.time) < threshold:
double_evt.fire(combine(last_fire.value, value))
last_fire.__init__(value, fire_time)
evt.handle(evt_handler)
return double_evt
def filter_r(evt, predicate):
filtered_evt = Event()
def evt_handler(value):
if predicate(value):
filtered_evt.fire(value)
evt.handle(evt_handler)
return filtered_evt
click = Event()
dlclick = doubleize_r(filter_r(click, lambda value : value == "left"), .01, lambda l1, l2: "double left")
dlclick.handle(echo)
click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double left"
That looks better and is more generic. The logic of "double click" is now contained all on one line. But, we could do even better. Notice that we repeat ourselves a little with filter_r and doublize_r. The pattern looks like this:
evt_out = Event() def handler(value): ... evt_out.fire(value) ... evt_in.handle(handler) return evt_outWhat if we refactor to pull out that common pattern by making a special handler that returns a value and a special "handle_with_result" that looks like this pattern?
def handler(value):
...
return value
evt_out = handle_with_result(evt_in, handler)
Step 3: make a higher-level "handle" function
If we do that, our code might look like this:
def handle_with_result(evt, handler_with_result):
evt_out = Event()
def handler(value):
result = handler_with_result(value)
if result is not None:
evt_out.fire(result)
evt.handle(handler)
return evt_out
def doubleize_event(evt, threshold, combine):
last_fire = EventFireRecord(None, 0)
def handler(value):
fire_time = time.time()
try:
if (fire_time - last_fire.time) < threshold:
return combine(last_fire.value, value)
finally:
last_fire.__init__(value, fire_time)
return handle_with_result(evt, handler)
def filter_event(evt, predicate):
def handler(value):
if predicate(value):
return value
return handle_with_result(evt, handler)
click = Event()
dlclick = doubleize_event(filter_event(click, lambda value : value == "left"), .01, lambda l1, l2 : "double left")
dlclick.handle(echo)
click.fire("left")
time.sleep(.02)
click.fire("left")
click.fire("right")
click.fire("left") #prints "double left"
It works, and our code looks better than ever. handle_with_result is very useful.
But, we are now missing something: what if we want to return multiple values? Or do something more interesting, like listen to an keyboard event and return left-clicks if the user clicks "l" and right clicks if they type "r" and two "fake" clicks if they type "f". We'd like to write something like this:
def choose_clicks(keys, clicks):
def key_handler(char):
if char == "l":
return filter_event("left", clicks)
elif char == "r":
return filter_event("right", clicks)
elif char == "f":
return ["fake", "fake"]
retrn handle_with_result(keys, key_handler)
If we change handle_with_result to handle this, we might get something like this:
def handle_with_result(evt, handler_with_result):
evt_out = Event()
def handler(value):
result = handler_with_result(value)
if result is None:
pass #ignore
elif isinstance(result, Event):
result.handle(evt_out.fire)
elif isinstance(result, list):
for value_out in result:
evt_out.fire(value_out)
else:
evt_out.fire(result)
evt.handle(handler)
return evt_out
def filter_r(evt, predicate):
def handler(value):
if predicate(value):
return value
return handle_with_result(evt, handler)
def value_filter_r(evt, value):
return filter_r(evt, lambda val : val == value)
def choose_clicks(keys, clicks):
def key_handler(char):
#TODO: unsubscribe from event after either "l" or "r"
if char == "l":
return value_filter_r(clicks, "left")
elif char == "r":
return value_filter_r(clicks, "right")
elif char == "f":
return ["fake", "fake"]
return handle_with_result(keys, key_handler)
keys = Event()
clicks = Event()
choosen_clicks = choose_clicks(keys, clicks)
choosen_clicks.handle(echo)
clicks.fire("left")
keys.fire("a")
clicks.fire("right")
keys.fire("l")
clicks.fire("left") # print "left"
clicks.fire("right")
clicks.fire("left") # print "left"
keys.fire("f") #prints "fake" and then "fake" again
Great. Now if we just add a little bit of syntax sugar to this, we can make events look like streams:
Step 4: add some syntax sugar
class Event:
def __init__(self):
self.handlers = []
def handle(self, handler):
self.handlers.append(handler)
return self #so += will work
def fire(self, val = None):
for handler in self.handlers:
handler(val)
def bind(evt, handler_with_result):
evt_out = Event()
def handler(value):
result = handler_with_result(value)
if result is not None:
Event.unit(result).handle(evt_out.fire)
evt.handle(handler)
return evt_out
@classmethod
def unit(cls, val):
if isinstance(val, cls):
return val
elif isinstance(val, list):
return MockEvent(*val)
else:
return MockEvent(val)
__rshift__ = bind
class MockEvent:
def __init__(self, *vals):
self.vals = vals
def handle(self, handler):
for val in self.vals:
handler(val)
def doublize_r(threshold, combine):
last_fire = EventFireRecord(None, 0)
def handler(value):
fire_time = time.time()
try:
if (fire_time - last_fire.time) < threshold:
return combine(last_fire.value, value)
finally:
last_fire.__init__(value, fire_time)
return handler
def filter_r(predicate):
def handler(value):
if predicate(value):
return value
return handler
def value_filter_r(value):
return filter_r(lambda val : val == value)
def click_choose_r(**clicks_by_char):
def key_handler(char):
return clicks_by_char.get(char)
return key_handler
clicks = Event()
keys = Event()
dlclicks = clicks >> value_filter_r("left") >> doublize_r(.01, lambda l1, l2: "double click")
keys >> click_choose_r(d = dlclicks, f = ["fake", "fake"]) >> echo
clicks.fire("left")
clicks.fire("left")
keys.fire("f") #prints "fake" and then "fake" again
keys.fire("d")
clicks.fire("right")
clicks.fire("right")
time.sleep(.02)
clicks.fire("left")
clicks.fire("left") #print ("left", "left")
So what have we made?
Wonderful. We've made events look like streams by making a slick way of creating event handlers. In fact, if you look closely at what I did in that last step, you'll notice that I renamed "handle_with_result" to "bind" and moved some code into a method called "unit". That's all it takes to turn Event into a monad, which is exactly what Rx does. Congratulations, we just reinvented monads and Rx, just by refactoring our event handler code and in the process we've discovered what Rx really is: a fancy way of writing event handlers, specifically event handlers that fire events that trigger other event handlers that fire events, and so on in big chain that looks like a query. So when your eyes glaze over about "duals" and "monads" and "reactive programming", just say to yourself: I'm making a fancy event handler. Because in the end, that's all you're really doing.In fact, if you want to do so in Python, now you have a basic implementation to start with! Of course, this is just a toy implementation. It lack error handling, unsubscribing, end-of-stream, and concurrency. But it ain't bad for just 50 lines of code. And it lets you see the essence of Rx fairly easily.
Oh, and what's the big deal with monads, you ask? Nothing much. It's just that if you provide "bind" and "unit" (called "select many" and "select" in LINQ, I think), LINQ gives you some nice syntax sugar that makes your event handler look like a query. It's really pretty slick, especially now that they've added "extension methods".
And next time...
In future posts, I'll explore different ways of making slick event handlers, but without monads. And hopefully we'll get make this handle concurrency, which is what asynchronous programming is all about. In fact, I expect we'll start to see a serious blurring of lines between Rx, message passing, and data flow programming.For now, when you start working with Rx, just remember: I'm making a big, fancy event handler. An "Observable" is just like and event and an "Observer" is just like an event handler.
P.S.
What's with the lame names? They started off with cool names like "Reactive" and "Rx" and then give us Observable, Observer, Subscribe, OnNext, OnDone, and OnError. Yuck. Think what an opportunity they missed! We could have had names like Emitter, Reactor, Chain, Emit, Extinguish, and Explode. Judge for yourself:observable.Subscribe(observer) observer.OnNext(val) observer.OnDone() observer.OnError(e)or
emitter.chain(reactor)
reactor.emit("foo")
reactor.extinguish()
reactor.explode(e)
Purpose
A slightly uncommon bit-twiddling operation is "popcount", which counts the number high bits. While uncommon, it's crucial for implementing Array Mapped Tries, which is a way to implement immutable maps and vectors. I'm trying to implement immutable maps and vectors, so I needed to implement popcount.I found A TON of ways, but had no idea which was fastest. So, I implemented them all and tested them. The short answer is to do this:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]
def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
And here's the long answer:
Results
I ran popcount on 30,000 random ints between 0 and 2
| Name | Time | Max Bits | URL |
| c_bagwell | 0.0030 | 32 | http://lamp.epfl.ch/papers/idealhashtrees.pdf |
| c_bsdmacro | 0.0030 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| c_parallel | 0.0031 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel |
| c_ops64 | 0.0036 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 |
| table16 | 0.0098 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
| c_table8 | 0.0110 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive |
| table+kern | 0.0132 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
| table8 | 0.0163 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
| bsdmacro | 0.0190 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| parallel | 0.0199 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| ops64 | 0.0207 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 |
| bagwell | 0.0242 | 32 | http://lamp.epfl.ch/papers/idealhashtrees.pdf |
| freebsd | 0.0257 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| timpeters3 | 0.0277 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
| timpeters2 | 0.0580 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
| timpeters | 0.0724 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
| kernighan | 0.0745 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan |
| elklund | 0.0889 | - | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| naive | 0.1519 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive |
| seistrup | 0.2447 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
And here are the results for my MacBook with a 2.0Ghz Core 2 Duo:
| Name | Time | Max Bits | URL |
| table16 | 0.0279 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
| table+kern | 0.0372 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
| table8 | 0.0417 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable |
| bsdmacro | 0.0481 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| bagwell | 0.0596 | 32 | http://lamp.epfl.ch/papers/idealhashtrees.pdf |
| timpeters3 | 0.0744 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
| parallel | 0.0807 | 32 | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| ops64 | 0.1290 | 32 | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64 |
| timpeters2 | 0.1439 | 32 | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
| kernighan | 0.1527 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan |
| timpeters | 0.1668 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
| freebsd | 0.1772 | - | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| elklund | 0.1913 | - | http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html |
| naive | 0.2889 | - | http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive |
| seistrup | 0.6106 | ? | http://mail.python.org/pipermail/python-list/1999-July/007696.html |
Observations
- Implementations written in C (using Pyrex) are 5-6 times faster than Python.
- My 3.2 Ghz Linux Desktop is 200% faster than my 2.0 Ghz MacBook even though it only has a 60% clockspeed advantage. I'm not sure what to make of that. Python doesn't use multiple cores well, so I'm sure it's not that.
- If you want to run popcount on 32-bit values in pure Python, table16 is the fastest by far, and only 3 times slower than implementations in C.
- If you need to run popcount on arbitrarily large integers, kernighan is the best, but doing a hybrid of table16 and kernighan is probably better.
Conclusion
If you don't mind using about 64KB of memory, here's is the fastest popcount in pure python:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]
def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
If you do mind the memory usage, here's a slighly slower version:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]
def popcount32_table8(v):
return (POPCOUNT_TABLE8[ v & 0xff] +
POPCOUNT_TABLE8[(v >> 8) & 0xff] +
POPCOUNT_TABLE8[(v >> 16) & 0xff] +
POPCOUNT_TABLE8[ v >> 24 ])
And if you need to handle values of more than 32 bits, one of these two are the best:
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
def popcount_kernighan(v):
c = 0
while v:
v &= v - 1
c += 1
return c
POPCOUNT32_LIMIT = 2**32-1
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]
def popcount_hybrid(v):
if v <= POPCOUNT32_LIMIT:
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
else:
c = 0
while v:
v &= v - 1
c += 1
return c
If it needs to be faster than that, write in C!
Test For Yourself
If you want to test these yourself, here's some code you can run.
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
def popcount_naive(v):
c = 0
while v:
c += (v & 1)
v >>= 1
return c
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetKernighan
def popcount_kernighan(v):
c = 0
while v:
v &= v - 1
c += 1
return c
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE8 = [0] * 2**8
for index in xrange(len(POPCOUNT_TABLE8)):
POPCOUNT_TABLE8[index] = (index & 1) + POPCOUNT_TABLE8[index >> 1]
def popcount32_table8(v):
return (POPCOUNT_TABLE8[ v & 0xff] +
POPCOUNT_TABLE8[(v >> 8) & 0xff] +
POPCOUNT_TABLE8[(v >> 16) & 0xff] +
POPCOUNT_TABLE8[ v >> 24 ])
POPCOUNT_TABLE16 = [0] * 2**16
for index in xrange(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]
def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
POPCOUNT32_LIMIT = 2**32-1
def popcount_table16_kernighan(v):
if v <= POPCOUNT32_LIMIT:
return (POPCOUNT_TABLE16[ v & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
else:
c = 0
while v:
v &= v - 1
c += 1
return c
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSet64
def popcount32_ops64(v):
return ((((v & 0xfff) * 0x1001001001001 & 0x84210842108421) % 0x1f) +
((((v & 0xfff000) >> 12) * 0x1001001001001 & 0x84210842108421) % 0x1f) +
(((v >> 24) * 0x1001001001001 & 0x84210842108421) % 0x1f))
#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
#also http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_parallel(v):
v = v - ((v >> 1) & 0x55555555)
v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
#v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F)
#v = v + ((v >> 4) & 0x0F0F0F0F)
v = (v + (v >> 4)) & 0x0F0F0F0F
v = (v * 0x1010101) >> 24
return v % 256 #I added %256. I'm not sure why it's needed. It's probably because of signed ints in Python
def popcount32_bagwell(v):
v = v - ((v >> 1) & 0x55555555)
v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
v = (v & 0x0F0F0F0F) + ((v >> 4) & 0x0F0F0F0F)
v = v + (v >> 8)
v = (v + (v >> 16)) & 0x3F
return v
#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount_elklund(v):
c = 0
while v:
v ^= v & -v
c += 1
return c
#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_freebsd(v):
v = (v & 0x55555555) + ((v & 0xaaaaaaaa) >> 1);
v = (v & 0x33333333) + ((v & 0xcccccccc) >> 2);
v = (v & 0x0f0f0f0f) + ((v & 0xf0f0f0f0) >> 4);
v = (v & 0x00ff00ff) + ((v & 0xff00ff00) >> 8);
v = (v & 0x0000ffff) + ((v & 0xffff0000) >> 16);
return v
#http://resnet.uoregon.edu/~gurney_j/jmpc/bitwise.html
def popcount32_bsdmacro(v):
#define BITCOUNT(x) (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
#define BX_(x) ((x) - (((x)>>1)&) - (((x)>>2)&0x33333333) - (((x)>>3)&0x11111111))
#
#def bx(v): (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111))
#def bc(v): ((bx(v) + (bx(v) >> 4)) & 0x0F0F0F0F) % 255
v = (v - ((v >> 1) & 0x77777777) - ((v >> 2) & 0x33333333) - ((v >> 3) & 0x11111111))
v = ((v + (v >> 4)) & 0x0F0F0F0F)
return v % 255
#http://mail.python.org/pipermail/python-list/1999-July/007696.html
import marshal, array, struct, string
POPCOUNT8_TRANS_TABLE = "".join(map(chr, POPCOUNT_TABLE8))
#changed by me to match new dumps() and use sum()
def popcount_timpeters(v):
counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE))
# overwrite type code
counts[0] = 0
return sum(counts)
#changed by me to add loop unrolling and not setting digitcounts[0]
def popcount32_timpeters2(v):
counts = array.array("B", string.translate(marshal.dumps(v), POPCOUNT8_TRANS_TABLE))
return counts[1] + counts[2] + counts[3] + counts[4]
#improved by me: no need to translate type char
def popcount32_timpeters3(v):
dumped = marshal.dumps(v)
return POPCOUNT_TABLE8[ord(dumped[1])] + POPCOUNT_TABLE8[ord(dumped[2])] + POPCOUNT_TABLE8[ord(dumped[3])] + POPCOUNT_TABLE8[ord(dumped[4])]
#http://mail.python.org/pipermail/python-list/1999-July/007696.html
_run2mask = {1: 0x5555555555555555L,
2: 0x3333333333333333L,
4: 0x0F0F0F0F0F0F0F0FL,
8: 0x00FF00FF00FF00FFL}
def buildmask2(run, n):
run2 = run + run
k = (n + run2 - 1) / run2
n = k * run2
try:
answer = _run2mask[run]
k2 = 64 / run2
except KeyError:
answer = (1L << run) - 1
k2 = 1
while k > k2:
k2 = k2 + k2
if k >= k2:
answer = answer | (answer << (run * k2))
else:
answer = answer | (answer << (run2 * (k - k2/2)))
k2 = k
if k2 > k:
answer = answer >> (run2 * (k2 - k))
return answer, n
def nbits(v):
return 32 #???
def popcount_seistrup(v):
lomask, n = buildmask2(1, nbits(v))
v = v - ((v >> 1) & lomask)
target = 2
while n > target:
lomask, n = buildmask2(target, n)
v = (v & lomask) + ((v >> target) & lomask)
target = target + target
for i in range(1, target/2):
if n <= target:
break
n = n >> 1
n = (n + target - 1) / target * target
v = (v & ((1L <> n)
return int(v)
if __name__ == "__main__":
import time, random
def time_func(func):
before = time.time()
result = func()
after = time.time()
span = after - before
return result, span
popcounts = [popcount_naive,
popcount32_table8,
popcount32_table16,
popcount_table16_kernighan,
popcount_kernighan,
popcount32_ops64,
popcount32_parallel,
popcount32_bagwell,
popcount_elklund,
popcount32_freebsd,
popcount32_bsdmacro,
popcount_seistrup,
popcount_timpeters,
popcount32_timpeters2,
popcount32_timpeters3,
]
test_count = 30000
max_int = 2**31 - 2
ints = range(0, 257) + [random.randint(0, max_int) for _ in xrange(test_count)] + range(max_int - 100, max_int+1)
expected_counts = map(popcount_naive, ints)
for popcount in popcounts:
counts, timespan = time_func(lambda : map(popcount, ints))
print "%5.5f %s" % ((timespan if (counts == expected_counts) else -1), popcount.__name__) #-1 means failure
A valued lesson that I have learned the hard way is that mutable data can get nasty, and that it's really nice to have immutable data structures. I've been writing a lot of python recently, and after a while I realized that I was writing a lof of this:
class SomeDataStructure:
def __init__(self, arg1, arg2):
self.prop1 = arg1
self.prop2 = arg2
Not only was this repetitive, but I found that little mutability bugs were cropping up on me. It's just too easy to trip over them when the default is mutability, as it is in python. In fact, immutable data structures aren't even a built-in option in python.
Finally the pain of using mutable data structures grew too large. I looked for a way and couldn't find anything, so I created my own. It supports getters, setters, inheritance, default values, altering several values at once, and I think the syntax is nice. I've been using it for everything the last few months and it's been great. It's helped with code repetition, concurrency, and serialization.
I hope you can learn from my valued lesson and not repeat my mistakes. Here's how you use it.
class Person(Record("name", "age")):
pass
class OldPerson(Person):
@classmethod
def prepare(cls, name, age = None):
return (name, age)
peter = Person("Peter", 26)
wes = Person("Wes", 28)
grandpa = OldPerson("Bubba")
wes2 = wes.setAge(29)
wes3 = wes.alter(name = "Grandpa", age = 57)
print peter, grandpa, wes.name, wes2.age
Here is the code. This is actually a simplified version of the original that I rewrote on my own time. I use a more comlete/complex version in the code I'm writing for my employer.
def Record(*props):
class cls(RecordBase):
pass
cls.setProps(props)
return cls
class RecordBase(tuple):
PROPS = ()
def __new__(cls, *values):
if cls.prepare != RecordBase.prepare:
values = cls.prepare(*values)
return cls.fromValues(values)
@classmethod
def fromValues(cls, values):
return tuple.__new__(cls, values)
def __repr__(self):
return self.__class__.__name__ + tuple.__repr__(self)
## overridable
@classmethod
def prepare(cls, *args):
return args
## setting up getters and setters
@classmethod
def setProps(cls, props):
for index, prop in enumerate(props):
cls.setProp(index, prop)
cls.PROPS = props
@classmethod
def setProp(cls, index, prop):
getter_name = prop
setter_name = "set" + prop[0].upper() + prop[1:]
setattr(cls, getter_name, cls.makeGetter(index, prop))
setattr(cls, setter_name, cls.makeSetter(index, prop))
@classmethod
def makeGetter(cls, index, prop):
return property(fget = lambda self : self[index])
@classmethod
def makeSetter(cls, index, prop):
def setter(self, value):
values = (value if current_index == index
else current_value
for current_index, current_value
in enumerate(self))
return self.fromValues(values)
return setter
For comparison, I've seen similar ideas at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 and http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303439 but I don't like their implementation or design as much, especially since they lack proper setters.
I've also read that future versions of Python will have NamedTuples, which is something I wish it had already.
Update: I've been informed that I didn't clearly explain what monads are and what the code examples are supposed to do. Sorry. I guess I assumed too much preexposure to monads. If you're no familiar with them, there are already so many tutorials on monads that I'll simply direct you to two of my favorites: spacesuits and wikipedia. I've also added more explanations of what the code snippets are supposed to do. I hope that helps.
Update 2: By popular demand, I'm including some code that you can actually run :). See the bottom of the article.
Recently, I used monads in production code for a soon-to-be-publically-released application. Although many think they are strange, estoric, and perhaps useless, monads were the best way to solve the problem. My code is not in Haskell; it's in Python. I'm not doing anything wierd like IO in a purely functional way; I'm just parsing a file.
The crazy, unexpected conclusion I came to is that you can and should use monads in your code in almost any programming language. There are two parts to this: "can" and "should". I think I'll save "should" for another article. Right now, I'm excited to show you how you can.
As a preview for "should", please consider that you may be using monads already without knowing it. LINQ in C# is a monad (pdf), so if you've ever used it, you've used a monad. The C# guys used monads for queries for the same reason I'm using for them for parsing: they're the right tool for the job. But unlike them, I can't change the syntax of the programming language.
The biggest challange with using monads in a "normal" programming language is that monads involve lots of closures. This is exactly the same problem you run into with CPS, which isn't surprising since a monad's "bind" operator is CPS and since continuations can be implemented with monads. By the way, if your programming laungage doesn't have closures (meaning you are stuck programming in C, C++, or Java), then monads are probably out of the question. Assuming you have closures and use them with monads directly, you end up with code like the following. It's python using the Maybe monad to handle divide by zero errors. I'm using ">>" (__rshift__) overloaded to mean "bind".
def mdiv(a, b):
if b == 0:
return Nothing
else:
return Something(a / b)
def with_maybe():
return \
mdiv(2.0, 2.0) >> (lambda val1 :
mdiv(3.0, 0.0) >> (lambda val2 :
mdiv(val1, val2) >> (lambda val3 :
Something(val3))))
That's not very pretty. We need a way to clean that up. How we can do so depends on the programming language. Haskell has "do" syntax built-in, which makes monadic code look like an impertive language or even a list comprehension. Ruby and Scheme have call/cc which makes it trivial to wrap a bind call with a continuation to make any monadic code look "normal". C# has LINQ, which is practically Haskell's do notation with funny names.
But I'm using python. What does python have? Luckily for me, in python 2.5 they added bidirectional generators, and I found a way to use them to make something like "do" notation. Now we can write the above code like this (@do and mreturn are defined later):
@do(Maybe)
def with_maybe(first_divisor):
val1 = yield mdiv(2.0, 2.0)
val2 = yield mdiv(3.0, 0.0)
val3 = yield mdiv(val1, val2)
mreturn(val3)
I even copied the names "do" and "return" from Haskell, although I had to spell "return" as "mreturn". All you really have to remember is that "yield" means "bind" and that the end result is a monad. There are limitations to this technique, but it's working very well for me so far. I've implemented the Maybe monad, the Error monad, the StateChanger monad, and the Continuation monad (which will require another article to explain). I particularly like the continuation monad because it allows me to write callcc, which lets me do threadless actors (message passing) in python:
from collections import deque
class Mailbox:
def __init__(self):
self.messages = deque()
self.handlers = deque()
def send(self, message):
if self.handlers:
handler = self.handlers.popleft()
handler(message)()
else:
self.messages.append(message)
def receive(self):
return callcc(self.react)
@do(ContinuationMonad)
def react(self, handler):
if self.messages:
message = self.messages.popleft()
yield handler(message)
else:
self.handlers.append(handler)
done(ContinuationMonad.zero())
@do(ContinuationMonad)
def insert(mb, values):
for val in values:
mb.send(val)
@do(ContinuationMonad)
def multiply(mbin, mbout, factor):
while True:
val = (yield mbin.receive())
mbout.send(val * factor)
@do(ContinuationMonad)
def print_all(mb):
while True:
print (yield mb.receive())
original = Mailbox()
multiplied = Mailbox()
print_all(multiplied)()
multiply(original, multiplied, 2)()
insert(original, [1, 2, 3])()
A few months ago, I wrote a similar implementation of threadless actors in python. It used generators in a similar way, but it was 10 times as much code. I was shocked at how short this implementation ended up being. You might think that it's because the continuation monad implementation is big. Nope. It's just as short (Monad defined later, and Record defined here):
class ContinuationMonad(Record("run"), Monad):
def __call__(self, cont = lambda a : a):
return self.run(cont)
def bind(self, bindee):
return ContinuationMonad(lambda cont : self.run(lambda val : bindee(val).run(cont)))
@classmethod
def unit(cls, val):
return cls(lambda cont : cont(val))
@classmethod
def zero(cls):
return cls(lambda cont : None)
def callcc(usecc):
return ContinuationMonad(lambda cont : usecc(lambda val : ContinuationMonad(lambda _ : cont(val))).run(cont))
So, you can use monads with elegant syntax in any language that has closures and any of the following:
- do syntax (Haskell, C#)
- call/cc (Scheme, Ruby)
- bidirectional generators (Python 2.5, a future JavaScript?)
- coroutines (Lua, Io)
The only think I haven't shown you is the implementation of Monad, @do, mreturn, and done. It has a few nasty details related to using generators and decorators in python, but here's the gist of it:
class Monad:
def bind(self, func):
raise NotImplementedError
def __rshift__(self, bindee):
return self.bind(bindee)
def __add__(self, bindee_without_arg):
return self.bind(lambda _ : bindee_without_arg())
@decorator_with_args
def do(func, func_args, func_kargs, Monad):
itr = func(*func_args, **func_kargs)
def send(val):
try:
monad = itr.send(val)
return monad.bind(send)
except MonadReturn, ret:
return Monad.unit(ret.value)
except Done, done:
return done.monad
return send(None)
def mreturn(val):
raise MonadReturn(val)
def done(val):
raise Done(val)
That's it. If you've made it all the way to the end of this long article, I hope you've found inspiration for using monads in your own applications, especially if you are coding in python. If so, here's some code that you can run:
import types
###### Base Monad and @do syntax#########
class Monad:
def bind(self, func):
raise NotImplementedError
def __rshift__(self, bindee):
return self.bind(bindee)
def __add__(self, bindee_without_arg):
return self.bind(lambda _ : bindee_without_arg())
def make_decorator(func, *dec_args):
def decorator(undecorated):
def decorated(*args, **kargs):
return func(undecorated, args, kargs, *dec_args)
decorated.__name__ = undecorated.__name__
return decorated
decorator.__name__ = func.__name__
return decorator
def make_decorator_with_args(func):
def decorator_with_args(*dec_args):
return make_decorator(func, *dec_args)
return decorator_with_args
decorator = make_decorator
decorator_with_args = make_decorator_with_args
@decorator_with_args
def do(func, func_args, func_kargs, Monad):
@handle_monadic_throws(Monad)
def run_maybe_iterator():
itr = func(*func_args, **func_kargs)
if isinstance(itr, types.GeneratorType):
@handle_monadic_throws(Monad)
def send(val):
try:
# here's the real magic
monad = itr.send(val)
return monad.bind(send)
except StopIteration:
return Monad.unit(None)
return send(None)
else:
#not really a generator
if itr is None:
return Monad.unit(None)
else:
return itr
return run_maybe_iterator()
@decorator_with_args
def handle_monadic_throws(func, func_args, func_kargs, Monad):
try:
return func(*func_args, **func_kargs)
except MonadReturn, ret:
return Monad.unit(ret.value)
except Done, done:
assert isinstance(done.monad, Monad)
return done.monad
class MonadReturn(Exception):
def __init__(self, value):
self.value = value
Exception.__init__(self, value)
class Done(Exception):
def __init__(self, monad):
self.monad = monad
Exception.__init__(self, monad)
def mreturn(val):
raise MonadReturn(val)
def done(val):
raise Done(val)
def fid(val):
return val
##### Failable Monad ######
class Failable(Monad):
def __init__(self, value, success):
self.value = value
self.success = success
def __repr__(self):
if self.success:
return "Success(%r)" % (self.value,)
else:
return "Failure(%r)" % (self.value,)
def bind(self, bindee):
if self.success:
return bindee(self.value)
else:
return self
@classmethod
def unit(cls, val):
return cls(val, True)
class Success(Failable):
def __init__(self, value):
Failable.__init__(self, value, True)
class Failure(Failable):
def __init__(self, value):
Failable.__init__(self, value, False)
def failable_monad_examle():
def fdiv(a, b):
if b == 0:
return Failure("cannot divide by zero")
else:
return Success(a / b)
@do(Failable)
def with_failable(first_divisor):
val1 = yield fdiv(2.0, first_divisor)
val2 = yield fdiv(3.0, 1.0)
val3 = yield fdiv(val1, val2)
mreturn(val3)
print with_failable(0.0)
print with_failable(1.0)
###### StateChanger Monad #########
class StateChanger(Monad):
def __init__(self, run):
self.run = run
def bind(self, bindee):
run0 = self.run
def run1(state0):
(result, state1) = run0(state0)
return bindee(result).run(state1)
return StateChanger(run1)
@classmethod
def unit(cls, val):
return cls(lambda state : (val, state))
def get_state(view = fid):
return change_state(fid, view)
def change_state(changer, view = fid):
def make_new_state(old_state):
new_state = changer(old_state)
viewed_state = view(old_state)
return (viewed_state, new_state)
return StateChanger(make_new_state)
def state_changer_monad_example():
@do(StateChanger)
def dict_state_copy(key1, key2):
val = yield dict_state_get(key1)
yield dict_state_set(key2, val)
mreturn(val)
@do(StateChanger)
def dict_state_get(key, default = None):
dct = yield get_state()
val = dct.get(key, default)
mreturn(val)
@do(StateChanger)
def dict_state_set(key, val):
def dict_set(dct, key, val):
dct[key] = val
return dct
new_state = yield change_state(lambda dct: dict_set(dct, key, val))
mreturn(val)
@do(StateChanger)
def with_dict_state():
val2 = yield dict_state_set("a", 2)
yield dict_state_copy("a", "b")
state = yield get_state()
mreturn(val2)
print with_dict_state().run({}) # (2, {"a" : 2, "b" : 2})
###### Continuation Monad #########
class ContinuationMonad(Monad):
def __init__(self, run):
self.run = run
def __call__(self, cont = fid):
return self.run(cont)
def bind(self, bindee):
return ContinuationMonad(lambda cont : self.run(lambda val : bindee(val).run(cont)))
@classmethod
def unit(cls, val):
return cls(lambda cont : cont(val))
@classmethod
def zero(cls):
return cls(lambda cont : None)
def callcc(usecc):
return ContinuationMonad(lambda cont : usecc(lambda val : ContinuationMonad(lambda _ : cont(val))).run(cont))
def continuation_monad_example():
from collections import deque
class Mailbox:
def __init__(self):
self.messages = deque()
self.handlers = deque()
def send(self, message):
if self.handlers:
handler = self.handlers.popleft()
handler(message)()
else:
self.messages.append(message)
def receive(self):
return callcc(self.react)
@do(ContinuationMonad)
def react(self, handler):
if self.messages:
message = self.messages.popleft()
yield handler(message)
else:
self.handlers.append(handler)
done(ContinuationMonad.zero())
@do(ContinuationMonad)
def insert(mb, values):
for val in values:
mb.send(val)
@do(ContinuationMonad)
def multiply(mbin, mbout, factor):
while True:
val = (yield mbin.receive())
mbout.send(val * factor)
@do(ContinuationMonad)
def print_all(mb):
while True:
print (yield mb.receive())
original = Mailbox()
multiplied = Mailbox()
print_all(multiplied)()
multiply(original, multiplied, 2)()
insert(original, [1, 2, 3])()
if __name__ == "__main__":
failable_monad_examle()
state_changer_monad_example()
continuation_monad_example()
My last article was about how you can do monads in python with nice syntax. Now I'd like to present nice monad syntax in Ruby. I'm not explaining what monads are or how you can use them. For now, I'm expecting you to be familiar with monads. If you aren't, go read a nice article on them.
Imagine you have a Monad base class like this:
class Monad
def bind(func)
raise "not implemented"
end
def self.unit(val)
raise "not implemented"
end
# bind to block
def bindb(&func;)
bind(func)
end
end
As an aside, Please excuse there being a "bind" and a "bindb". It's silly that Ruby differentiates between a block and Proc like that. I also think it's silly that I have to keep typing "lambda {|val| func(val)}" to turn a method into a Proc and "proc.call(val)" to call a Proc. In python, methods, functions, and lambdas are all the same thing, and it's a lot easier to code with. In this sense, python is way better. But Ruby has really slick syntax for passing in the block, and python's lambda restrictions are annoying. Why can't we have the best of both? I guess then I'd need Scala or Perl6. End aside.
Let's implement the Either monad, which I call it Failable:
class Failable < Monad
def initialize(value, success)
@value = value
@success = success
end
def bind(bindee)
if @success
bindee.call(@value)
else
self
end
end
def self.unit(val)
self.new(val, true)
end
def to_s
if @success
"Success(#{@value})"
else
"Failure(#{@value})"
end
end
end
def success(val)
Failable.new(val, true)
end
def failure(val)
Failable.new(val, false)
end
Now we can write some code that safely handles divide by zero without using exceptions (actually, exceptions essentiatlly are the Either monad, but never mind that for now):
def fdiv(a, b)
if b == 0
failure("cannot divide by zero")
else
success(a / b)
end
end
def fdiv_with_binding(first_divisor)
fdiv(2.0, first_divisor).bindb do |val1|
fdiv(3.0, 1.0) .bindb do |val2|
fdiv(val1, val2) .bindb do |val3|
success(val3)
end
end
end
end
puts fdiv_with_binding(1.0)
puts fdiv_with_binding(0.0)
Which prints:
Success(0.666666666666667) Failure(cannot divide by zero)
But it's not very pretty. Luckily, ruby has callcc, which makes it very easy to define a function which I'll call "rbind" which can do this:
def fdiv_with_rbinding(first_divisor)
with_monad Failable do
val1 = rbind fdiv(2.0, first_divisor)
val2 = rbind fdiv(3.0, 1.0)
val3 = rbind fdiv(val1, val2)
val3
end
end
def with_monad(monad_class, & block)
begin
val = block.call()
if val.class == monad_class
val
else
monad_class.unit(val)
end
rescue MonadEscape => escape
escape.monad
end
end
def rbind(monad)
begin
checked_callcc {|cc| monad.bind(cc)}
rescue ContinuationUnused => unused
raise MonadEscape.new(unused.result)
end
end
class MonadEscape < RuntimeError
attr :monad
def initialize(monad)
@monad = monad
end
end
def checked_callcc(&with;_cc)
callcc do |cont|
val = with_cc.call(lambda {|val| cont.call(val)})
raise ContinuationUnused.new(val) if cont
val
end
end
class ContinuationUnused < RuntimeError
attr :result
def initialize(result)
@result = result
end
end
Ruby's block syntax makes it very nice to say "with_monad Monad do", which I like. But I don't really like seeing "= rbind". I'd really like it if we could override "<<" and read that as "=<<", but I think we're stuck with unary operators ("+", "-", "~", all of which look bad here) or words. At least we can choose our name, unlike in python where we have to use "yield". Does anyone have any idea for a better name?
Maybe "xx" would work:
def fdiv_with_rbinding(first_divisor)
with_monad Failable do
val1 =xx fdiv(2.0, first_divisor)
val2 =xx fdiv(3.0, 1.0)
val3 =xx fdiv(val1, val2)
val3
end
end
So there you have it. You can have nice monad syntax in Ruby using callcc. Unfortunately, I've read that not all implementations of Ruby have continuatoins, which means JRuby and IronRuby may be left out in the cold. Keep in mind that Ruby's "yield" syntax is NOT a true generator, which means that we can't use the same trick that we used in python. It's callcc or nothing.
Here's the complete code in case you want to cut and paste it:
############# Monad Base ##############
class Monad
def bind(func)
raise "not implemented"
end
def self.unit(val)
raise "not implemented"
end
# bind to block
def bindb(&func;)
bind(func)
end
end
def with_monad(monad_class, █)
begin
val = block.call()
if val.class == monad_class
val
else
monad_class.unit(val)
end
rescue MonadEscape => escape
escape.monad
end
end
# "reverse" bind, or bind to the return value, or bind to the continuation
def rbind(monad)
begin
mycallcc {|cc| monad.bind(cc)}
rescue ContinuationUnused => unused
raise MonadEscape.new(unused.result)
end
end
class MonadEscape < RuntimeError
attr :monad
def initialize(monad)
@monad = monad
end
end
def mycallcc(&with;_cc)
used = false
val = callcc do |cc|
fake_cc = lambda do |val|
used = true
cc.call(val)
end
with_cc.call(fake_cc)
end
raise ContinuationUnused.new(val) unless used
val
end
class ContinuationUnused < RuntimeError
attr :result
def initialize(result)
@result = result
end
end
############# Failable Monad ##################
class Failable < Monad
def initialize(value, success)
@value = value
@success = success
end
def bind(bindee)
if @success
bindee.call(@value)
else
self
end
end
def self.unit(val)
self.new(val, true)
end
def to_s
if @success
"Success(#{@value})"
else
"Failure(#{@value})"
end
end
end
def success(val)
Failable.new(val, true)
end
def failure(val)
Failable.new(val, false)
end
######## Failable Monad Example ##########
def fdiv(a, b)
if b == 0
failure("cannot divide by zero")
else
success(a / b)
end
end
def with_failable_binding(first_divisor)
fdiv(2.0, first_divisor).bindb { |val1|
fdiv(3.0, 1.0) .bindb { |val2|
fdiv(val1, val2) .bindb { |val3|
success(val3)
}
}
}
end
def xx(monad)
rbind(monad)
end
def with_failable_rbinding(first_divisor)
with_monad Failable do
val1 =xx fdiv(2.0, first_divisor)
val2 =xx fdiv(3.0, 1.0)
val3 =xx fdiv(val1, val2)
val3
end
end
puts with_failable_binding(1.0)
puts with_failable_binding(0.0)
puts with_failable_rbinding(1.0)
puts with_failable_rbinding(0.0)
Recently, I wrote about ways to add something like Haskell's do syntax to programming languages like python and ruby. The python trick used bidirectional generators and the ruby trick used callcc. I have since been notified that there is one serious problem with both of them: the bindee (the function given to bind) can only be called once. For many monads, this limitation is acceptable, but for some, it means you can't use that nice syntax.
Take the List monad, for example, in ruby:
class ListMonad < Monad
attr_accessor :array
def initialize(array)
@array = array
end
def bind(bindee)
ListMonad.new(concat(array.map { |val| maybeUnit(bindee.call(val)).array }))
end
def self.unit(val)
self.new([val])
end
def to_s
joined = array.join(', ')
"ListMonad([#{joined}])"
end
end
## I'm hoping this if faster than using fold (inject)
def concat(arrays)
all = []
for array in arrays
all += array
end
all
end
class Array
def bind(bindee)
ListMonad.new(self).bind(bindee)
end
def bindb(█)
ListMonad.new(self).bindb(█)
end
end
Ruby gives us the ability to extend Array to get a pretty sweet looking use of the monad:
list1 = with_monad ListMonad do x =xx [1,2,3] y =xx [10,20,30] x + y end
It looks great, but list1 should be ListMonad([11, 21, 31, 12, 22, 32, 13, 23, 33]), be we get ListMonad([11]) instead.
Traditional binding works fine:
list2 = [1, 2, 3].bindb do |x|
[10, 20, 30].bindb do |y|
x + y
end
end
list2 is ListMonad([11, 21, 31, 12, 22, 32, 13, 23, 33]). So, we know our monad is fine, but our with_monad doesn't work.
We have the same problem in python:
class ListMonad(Monad):
def __init__(self, vals):
self.vals = vals
def bind(self, bindee):
return self.__class__(concat((self.maybeUnit(bindee(val)) for val in self.vals)))
@classmethod
def unit(cls, val):
return cls([val])
def __repr__(self):
return "ListMonad(%s)" % self.vals
def __iter__(self):
return iter(self.vals)
def concat(lists):
return [val for lst in lists
for val in lst]
@do(ListMonad)
def with_list_monad():
x = yield [1, 2, 3]
y = yield [10, 20, 30]
mreturn(x + y)
list1 = with_list_monad()
def with_list_monad_binding():
return \
ListMonad([ 1, 2, 3]) >> (lambda x:
ListMonad([10, 20, 30]) >> (lambda y:
x + y))
list2 = with_list_monad_binding()
Again, it looks good, but list1 is ListMonad([11, None, None, None, None]) while list2 is ListMonad([11, 21, 31, 12, 22, 32, 13, 23, 33]). The result of list1 sure is bizarre.
Luckily, I have found a solution for ruby, and a bad hack that could be construed as a solution for python if you ignore some issues.
In Ruby, the trick is to monkey with the rbind continuation so that at the end of with_monad it returns control to the point where the continuation is called. In other words, we need to store a second continuation at the point right after the first continuation is called. Confused yet? Continuations will melt your brain, and this took a little while for met get right:
def with_monad_ext(monad_class, █)
finishers = []
rbind_ext = lambda do |monad|
begin
checked_callcc do |outer_cont|
monad.bindb do |val|
callcc do |inner_cont|
finishers.push(inner_cont)
outer_cont.call(val)
end
end
end
rescue ContinuationUnused => unused
raise MonadEscape.new(unused.result)
end
end
val = begin
monad_class.maybeUnit(block.call(rbind_ext))
rescue MonadEscape => escape
escape.monad
end
finisher = finishers.pop()
if finisher
finisher.call(val)
else
val
end
end
list3 = with_monad_ext ListMonad do |rbind|
x = rbind.call [1,2,3]
y = rbind.call [10,20,30]
x + y
end
Success! list3 is correctly ListMonad([11, 21, 31, 12, 22, 32, 13, 23, 33]). The only real problem is that the normal rbind won't work. We have to make a special one and give it to the block as an argument. This wouldn't be that bad, except that Ruby has this silly limitation so we have to say rbind.call rather than just rbind, which makes it less fun to type. I named it with_monad_ext because I don't think it will be needed as often as with_monad, and with_monad is more convenient.
Now, back to python. The problem is harder. It's rooted in the fact that for a given iterator, itr.send() is not reentrant. But, if we could copy the iterator, that would give us a possible solution. I did a little googling and found that python isn't going to have copy.copy(itr) anytime soon because no one cares about them. Some people have made some progress, but it segfaults on my computer (64-bit Linux, in case you're wondering).
So, I came up with this terrible little hack which makes it possible to "copy" iterators:
class CopyableIterator:
def __init__(self, generator, log = ()):
self.generator = generator
self.log = list(log) #hmmm... if the logs were immutable, we wouldn't have to do this
self.iterator = None
def getIterator(self):
if self.iterator is None:
self.iterator = self.generator()
for value in self.log:
self.iterator.send(value)
return self.iterator
def send(self, value):
iterator = self.getIterator()
self.log.append(value)
return iterator.send(value)
def next(self):
return self.send(None)
def copy(self):
return self.__class__(self.generator, self.log)
That let's us define @do_ext:
@decorator_with_args
def do_ext(func, func_args, func_kargs, Monad):
@handle_monadic_throws(Monad)
def run_maybe_iterator():
itr = func(*func_args, **func_kargs)
if isinstance(itr, types.GeneratorType):
@handle_monadic_throws(Monad)
def send(itr, val):
try:
# here's the real magic
monad = Monad.maybeNew(itr.send(val))
return monad.bind(lambda val : send(itr.copy(), val))
except StopIteration:
return Monad.unit(None)
return send(CopyableIterator(lambda : func(*func_args, **func_kargs)), None)
else:
#not really a generator
if itr is None:
return Monad.unit(None)
else:
return itr
return run_maybe_iterator()
@do_ext(ListMonad)
def with_list_monad_ext():
x = yield [1, 2, 3]
y = yield [10, 20, 30]
mreturn(x + y)
list3 = with_list_monad_ext()
And list3 is ListMonad([11, 21, 31, 12, 22, 32, 13, 23]). Success again! There is a downside, though. First, all of the calculations are done for all of the combinations. This is especially bad if you do this:
@do_ext(ListMonad)
def with_list_monad_ext():
print "foo"
x = yield [1, 2, 3]
y = yield [10, 20, 30]
mreturn(x + y)
Because now you'll print "foo" 9 times instead of 1.
So there you have it: you can do the List monad with do syntax in python and ruby, but you'll have to decide whether the trade-offs are worth it.
Here's a valued lessons that I learned the long and hard way: how to easily print flash cards. Don't waste any time doing it the hard way like I did. Follow my easy route and save yourself some time.
The other day, my wife had a very simple computer need: print and cut pictures into little uniform cards, like flash cards with pictures. It sounds like a simple problem, but there's a simple way and an easy way, and I'd like to share with you the easy way.
The simple way is what she did: use some software like Scribus (in her case) or Microsoft Word (if she were almost anyone else), copy the pictures in, manually align them, and print. That works great ... until you try and do 700 pictures (about 80 pages of 9 pictures per page). Scribus ate all of the computer's memory, the hard drive thrashed like mad, everything slowed down, and my poor wife spent hours putting it all together. I doubt Word would have fared much better.
But it still wasn't over. When she tried to print, the printer printed out one page with "error" on it. So, we exported to a PDF, and tried printing that: "error" again. So, we tried to print one page at a time: after about 5 minutes, it printed. All I could figure was that something bad was going on between the computer and the printer and involving sending these images uncompressed.
At this point, I decided that it was time to call it quits on manual work and use automated work: write a program to do this for me. I'm not about to manually tell 78 pages to individually print. So, with some quick bash command line work, I told it to print each page by itself. That worked for a few pages, but some were still too big and we got the dreaded "error".
Finally, I decdied to really role up my sleeves and write a program like we should have in the first place: in go pictures, out comes a PDF, and your print the PDF. Additionally, I wanted like to control the DPI of the images used to control the data flow to the printer. I did so, made the PDF, printed it and (after a while, still), it printed. Hurray! Even better, now if we wanted to change pictures or change sizes, we'll be able to do it with very little work.
So here's the program. Put some pictures in a directory, run the program, and print the pdf of "flash cards" or "image thumbnails" or "reading lessons" or whatever you want to call them. You could even make this into an Automator task in Mac OS X. It uses ImageMagick, though, so make sure you have that installed first.
from subprocess import call
MONTAGE = ["montage", "-bordercolor", "white"]
CONVERT = ["convert", "-bordercolor", "white"]
def montage_files_of_paths(paths, name,
columns = 3, rows = 3,
width = 10, height = 8, xborder = .33, yborder = .33, density = 200,
frame = 5, spacing = 10,
intermediate_ext = "png", final_ext = "pdf"):
width, height, xborder, yborder = (int(val * density) for val in (width, height, xborder, yborder))
thumbnail_width = int((width - (xborder * 2)) / columns)
thumbnail_height = int((height - (yborder * 2)) / rows)
call(MONTAGE + ["-tile", "%rx%r" % (rows, columns),
"-geometry", "%rx%r+%r+%r" % (thumbnail_width, thumbnail_height, spacing, spacing),
"-frame", str(frame),
"-density", str(density)]
+ [path + "[0]" for path in paths] #[0] to ignore animated GIFS
+ [name + "." + intermediate_ext])
call(CONVERT + ["-border", "%rx%r" % (xborder, yborder),
"-density", str(density),
"-annotate", "0x0+%r+%r" % (xborder, yborder), name,
name + "*." + intermediate_ext,
name + "." + final_ext])
import sys
if __name__ == "__main__":
if len(sys.argv) < 2:
print "usage: %s image_directory, image_directory, image_directory ..." % (sys.argv[0],)
else:
for dirname in sys.argv[1:]:
montage_files_of_paths([dirname+"/*"], dirname)
Update: It turns out that there's a similar technique being used by pyparsing. I hadn't seen it before and when I first saw it I thought had reinvted the wheel and wasted my time. But upon further inspection, Pysec does a few things a much better than pyparsing, which happen to be the exact things that I need. There's no coincedence, of course, that Pysec matches my needs well. I'll be covering this in more detail in a future article.
Update 2: I got @do syntax to work! Again, stay tuned for another article on how.
I've talked about monads in the past, but I never really covered what purpose they serve. I covered the "how" in Python and Ruby, but I doubt I'll ever full cover the "why" because it's simply too big of a subject. But today, I'd like to share with you one example of how monads are useful. In fact, it's the example that motivated me to do all of the monad stuff in the first place: parsing.
Parsing is a topic that's been around pretty much forever in computer science and most people thing it's pretty much "solved". My experience is that we've still got a long way to go. Specifically, I'm writing an application with lots of distributed concurrency, which requires lots of data serialization and deserialization (aka parsing). There are very few good serialization libraries out there, and I've been through three or four versions of various techniques. Finally, I think I have found a parsing technique that works well: monadic combinatoric parsing. And it's in Python.
What the heck does that mean? "monadic" means we're using monads. "combinatoric" means we can take monad parsers and combine them to make new monad parsers, which is extremly powerful. I call it Pysec. The design is a copy of Parsec brought to Python. Notice how I said "design"; I didn't bother looking at any of their code; The design described on their web page was good enough guidance for me. But, I'm sure that their implementation is WAY better than mine. If you want to see real monadic parsing, look at Parsec. If you're interested in monadic parsing for Python, keep reading.
Here's an example of Pysec for parsing a subset of JSON:
from Pysec import Parser, choice, quoted_chars, group_chars, option_chars, digits, between, pair, spaces, match, quoted_collection
# json_choices is a hack to get around mutual recursion
# a json is value is one of text, number, mapping, and collection
# text is any characters between quotes
# a number is like the regular expression -?[0-9]+(\.[0-9]+)?
# "parser >> Parser.lift(func)" means "pass the parsed value into func and return a new Parser"
# quoted_collection(start, space, inner, joiner, end)
# means "a list of inner separated by joiner surrounded by start and end"
# we have to put a lot of "spaces" in since JSON allows lot of optional whitespace
json_choices = []
json = choice(json_choices)
text = quoted_chars("'", "'")
number = group_chars([option_chars(["-"]), digits, option_chars([".", digits])]) >> Parser.lift(float)
joiner = between(spaces, match(","), spaces)
mapping_pair = pair(text, spaces & match(":") & spaces & json)
collection = quoted_collection("[", spaces, json, joiner, "]") >> Parser.lift(list)
mapping = quoted_collection("{", spaces, mapping_pair, joiner, "}") >> Parser.lift(dict)
json_choices.extend([text, number, mapping, collection])
print json.parseString("{'a' : -1.0, 'b' : 2.0, 'z' : {'c' : [1.0, [2.0, [3.0]]]}}")
Like most monadic or functional code, it's pretty dense, so don't feel bad if you go cross-eyed looking at it the first time. Realize that most of the code is building a Parser monad called "json", which parses the test string at the end. I tried to comment each individual part to explain what's going on.
You may be thinking "why would I want to write code like this?". One response is to look at the code: it's the bulk of JSON parsing in 15 lines that look like a grammar definition! Another response I can give is a challange: go write a parser to parse that string and then compare your code to this code. Which is shorter? Which is easier to read? Which is more elegant? While in college, I had a team project to write a compiler for a subset of Pascal. We were smart enough to use Python, but dumb enough to use Yacc and Flex. I'm sure the parser portion was pretty fast, but it was incredibly painful to get it right. Once we did, we dared not touch it for fear of breaking it. I really wish I had Parse/Pysec back then (ok, Parsec was around back then, but I hadn't even heard of Haskell or monads).
But monadic combinatoric parsing isn't just about making your code look like a grammar definition. It makes it possible to combine parsers in incredibly flexible ways. For example, let's say that on a differnt project, you wrote a simplified CSV parser for numbers like this one:
def line(cell):
return sep_end_by(cell, match(","))
def csv(cell):
return sep_end_by(line(cell), match("\n"))
print csv(number).parseString("1,2,3\n4,5,6")
And now you realize you'd really like to put whole JSON values in your simplified CSV. In other words, you want to combine the CVS and JSON parsers. I think that you'll find that doing so really isn't as easy as it sounds. Imagine trying to combine two Yacc grammars. It hurts just thinking about that. Luckily, monadic combinatoric parsers make this incredibly easy:
print csv(json).parseString("{'a' : 'A'},[1, 2, 3],'zzz'\n-1.0,2.0,-3.0")
While this is a slighly contrived example, you must understand that with this technique you can combine any two parsers in a similar fasion. I don't know about you, but that really feels right to me. I haven't seen any parsing techniques as elegant as that.
Everything it's perfect of course, especially since making this work in Python is a bit of a hack. Here are some downsides to this approach:
- It's hard to debug when you do something wrong.
- It's annoying to import so many things from Pysec.
- You have to hack around mutual recursion of values in a strict language like python.
- There's probably a performance hit.
Most of these can be either fixed or worked around, though, so I think long-term monadic parsing is good bet. I'd like to see what you can do with it or to make it better.
I know how much you all like a working example, so here's some code that you can just cut, paste, and run (in Python 2.5; older versions of Python may require some tweaking). Please ignore the top half. It's mostly stuff I've covered in other articles, like Immutable Records and monad base classes. The meat starts where it says "Parser Monad".
I'd like to talk about it in more detail, but I'll save that for another article. For now, please play around with it and see what you think.
##### Base Libraries included here for convenience ###########
def Record(*props):
class cls(RecordBase):
pass
cls.setProps(props)
return cls
class RecordBase(tuple):
PROPS = ()
def __new__(cls, *values):
if cls.prepare != RecordBase.prepare:
values = cls.prepare(*values)
return cls.fromValues(values)
@classmethod
def fromValues(cls, values):
return tuple.__new__(cls, values)
def __repr__(self):
return self.__class__.__name__ + tuple.__repr__(self)
## overridable
@classmethod
def prepare(cls, *args):
return args
## setting up getters and setters
@classmethod
def setProps(cls, props):
for index, prop in enumerate(props):
cls.setProp(index, prop)
cls.PROPS = props
@classmethod
def setProp(cls, index, prop):
getter_name = prop
setter_name = "set" + prop[0].upper() + prop[1:]
setattr(cls, getter_name, cls.makeGetter(index, prop))
setattr(cls, setter_name, cls.makeSetter(index, prop))
@classmethod
def makeGetter(cls, index, prop):
return property(fget = lambda self : self[index])
@classmethod
def makeSetter(cls, index, prop):
def setter(self, value):
values = (value if current_index == index
else current_value
for current_index, current_value
in enumerate(self))
return self.fromValues(values)
return setter
class ByteStream(Record("bytes", "index")):
@classmethod
def prepare(cls, bytes, index = 0):
return (bytes, index)
def get(self, count):
start = self.index
end = start + count
bytes = self.bytes[start : end]
return bytes, (self.setIndex(end) if bytes else self)
def make_decorator(func, *dec_args):
def decorator(undecorated):
def decorated(*args, **kargs):
return func(undecorated, args, kargs, *dec_args)
decorated.__name__ = undecorated.__name__
return decorated
decorator.__name__ = func.__name__
return decorator
decorator = make_decorator
class Monad:
## Must be overridden
def bind(self, func):
raise NotImplementedError
@classmethod
def unit(cls, val):
raise NotImplementedError
@classmethod
def lift(cls, func):
return (lambda val : cls.unit(func(val)))
## useful defaults that should probably NOT be overridden
def __rshift__(self, bindee):
return self.bind(bindee)
def __and__(self, monad):
return self.shove(monad)
## could be overridden if useful or if more efficient
def shove(self, monad):
return self.bind(lambda _ : monad)
class StateChanger(Record("changer", "bindees"), Monad):
@classmethod
def prepare(cls, changer, bindees = ()):
return (changer, bindees)
# binding can be slow since it happens at bind time rather than at run time
def bind(self, bindee):
return self.setBindees(self.bindees + (bindee,))
def __call__(self, state):
return self.run(state)
def run(self, state0):
value, state = self.changer(state0) if callable(self.changer) else self.changer
state = state0 if state is None else state
for bindee in self.bindees:
value, state = bindee(value).run(state)
return (value, state)
@classmethod
def unit(cls, value):
return cls((value, None))
######## Parser Monad ###########
class ParserState(Record("stream", "position")):
@classmethod
def prepare(cls, stream, position = 0):
return (stream, position)
def read(self, count):
collection, stream = self.stream.get(count)
return collection, self.fromValues((stream, self.position + count))
class Parser(StateChanger):
def parseString(self, bytes):
return self.parseStream(ByteStream(bytes))
def parseStream(self, stream):
state = ParserState(stream)
value, state = self.run(state)
return value
class ParseFailed(Exception):
def __init__(self, message, state):
self.message = message
self.state = state
Exception.__init__(self, message)
@decorator
def parser(func, func_args, func_kargs):
def changer(state):
return func(state, *func_args, **func_kargs)
changer.__name__ = func.__name__
return Parser(changer)
##### combinatoric functions #########
@parser
def tokens(state0, count, process):
tokens, state1 = state0.read(count)
passed, value = process(tokens)
if passed:
return (value, state1)
else:
raise ParseFailed(value, state0)
def read(count):
return tokens(count, lambda values : (True, values))
@parser
def skip(state0, parser):
value, state1 = parser(state0)
return (None, state1)
@parser
def option(state, default_value, parser):
try:
return parser(state)
except ParseFailed, failure:
if failure.state == state:
return (default_value, state)
else:
raise
@parser
def choice(state, parsers):
for parser in parsers:
try:
return parser(state)
except ParseFailed, failure:
if failure.state != state:
raise failure
raise ParseFailed("no choices were found", state)
@parser
def match(state0, expected):
actual, state1 = read(len(expected))(state0)
if actual == expected:
return actual, state1
else:
raise ParseFailed("expected %r" % (expected,), state0)
def between(before, inner, after):
return before & inner >> (lambda value : after & Parser.unit(value))
def quoted(before, inner, after):
return between(match(before), inner, match(after))
def quoted_collection(start, space, inner, joiner, end):
return quoted(start, space & sep_end_by(inner, joiner), end)
@parser
def many(state, parser, min_count = 0):
values = []
try:
while True:
value, state = parser(state)
values.append(value)
except ParseFailed:
if len(values) < min_count:
raise
return values, state
@parser
def group(state, parsers):
values = []
for parser in parsers:
value, state = parser(state)
values.append(value)
return values, state
def pair(parser1, parser2):
# return group((parser1, parser2))
return parser1 >> (lambda value1 : parser2 >> (lambda value2 : Parser.unit((value1, value2))))
@parser
def skip_many(state, parser):
try:
while True:
value, state = parser(state)
except ParseFailed:
return (None, state)
def skip_before(before, parser):
return skip(before) & parser
@parser
def skip_after(state0, parser, after):
value, state1 = parser(state0)
_, state2 = after(state1)
return value, state2
@parser
def option_many(state0, first, repeated, min_count = 0):
try:
first_value, state1 = first(state0)
except ParseFailed:
if min_count > 0:
raise
else:
return [], state0
else:
values, state2 = many(repeated, min_count-1)(state1)
values.insert(0, first_value)
return values, state2
# parser separated and ended by sep
def end_by(parser, sep_parser, min_count = 0):
return many(skip_after(parser, sep_parser), min_count)
# parser separated by sep
def sep_by(parser, sep_parser, min_count = 0):
return option_many(parser, skip_before(sep_parser, parser), min_count)
# parser separated and optionally ended by sep
def sep_end_by(parser, sep_parser, min_count = 0):
return skip_after(sep_by(parser, sep_parser, min_count), option(None, sep_parser))
##### char-specific parsing ###########
def satisfy(name, passes):
return tokens(1, lambda char : (True, char) if passes(char) else (False, "not " + name))
def one_of(chars):
char_set = frozenset(chars)
return satisfy("one of %r" % chars, lambda char : char in char_set)
def none_of(chars):
char_set = frozenset(chars)
return satisfy("not one of %r" % chars, lambda char : char and char not in char_set)
def maybe_match_parser(parser):
return match(parser) if isinstance(parser, str) else parser
def maybe_match_parsers(parsers):
return tuple(maybe_match_parser(parser) for parser in parsers)
def many_chars(parser, min_count = 0):
return join_chars(many(parser, min_count))
def option_chars(parsers):
return option("", group_chars(parsers))
def group_chars(parsers):
return join_chars(group(maybe_match_parsers(parsers)))
#return join_chars(group(parsers))
def join_chars(parser):
return parser >> Parser.lift("".join)
def while_one_of(chars, min_count = 0):
return many_chars(one_of(chars), min_count)
def until_one_of(chars, min_count = 0):
return many_chars(none_of(chars), min_count)
def char_range(begin, end):
return "".join(chr(num) for num in xrange(ord(begin), ord(end)))
def quoted_chars(start, end):
assert len(end) == 1, "end string must be exactly 1 character"
return quoted(start, many_chars(none_of(end)), end)
digit = one_of(char_range("0", "9"))
digits = many_chars(digit, min_count = 1)
space = one_of(" \v\f\t\r\n")
spaces = skip_many(space)
############# simplified JSON ########################
#from Pysec import Parser, choice, quoted_chars, group_chars, option_chars, digits, between, pair, spaces, match, quoted_collection, sep_end_by
#HACK: json_choices is used to get around mutual recursion
#a json is value is one of text, number, mapping, and collection, which we define later
json_choices = []
json = choice(json_choices)
#text is any characters between quotes
text = quoted_chars("'", "'")
#sort of like the regular expression -?[0-9]+(\.[0-9]+)?
#in case you're unfamiliar with monads, "parser >> Parser.lift(func)" means "pass the parsed value into func but give me a new Parser back"
number = group_chars([option_chars(["-"]), digits, option_chars([".", digits])]) >> Parser.lift(float)
#quoted_collection(start, space, inner, joiner, end) means "a list of inner separated by joiner surrounded by start and end"
#also, we have to put a lot of spaces in there since JSON allows lot of optional whitespace
joiner = between(spaces, match(","), spaces)
mapping_pair = pair(text, spaces & match(":") & spaces & json)
collection = quoted_collection("[", spaces, json, joiner, "]") >> Parser.lift(list)
mapping = quoted_collection("{", spaces, mapping_pair, joiner, "}") >> Parser.lift(dict)
#HACK: finish the work around mutual recursion
json_choices.extend([text, number, mapping, collection])
############# simplified CSV ########################
def line(cell):
return sep_end_by(cell, match(","))
def csv(cell):
return sep_end_by(line(cell), match("\n"))
############# testing ####################
print json.parseString("{'a' : -1.0, 'b' : 2.0, 'z' : {'c' : [1.0, [2.0, [3.0]]]}}")
print csv(number).parseString("1,2,3\n4,5,6")
print csv(json).parseString("{'a' : 'A'},[1, 2, 3],'zzz'\n-1.0,2.0,-3.0")
The situation is that I'm writing a peer-to-peer file synchronizer and I need to serialize binary data in an efficient way. I've choosen to use what I call "sized binary", which is basically netstrings without the comma or bencode's byte string. The byte-string "ABC", for example, would be encoded as "3:ABC", or "Hello World!" would be "12:Hello World!". This is very efficient because "ABC" or "Hello World!" can be any bytes \x00-\xFF, without any sort of encoding or decoding like uuencode.
Stop and think for a minute how you would define a grammer to parse "3:ABC". It's so simple, right? Well, try and write a grammar for it. Try using BNF, EBNF, yacc/bison, pyparsing, SimpleParse, or any parsing libarary you know of, in any programming language. If you manage, please write me an email and let me know how you did it, because I'm not so sure it's even possible. Maybe it is in the Perl6 grammars; they're throwing the kitchen sink into that thing :). The best I've seen is that the creator of pyparsing hacked a grammar object to change how it parses the "ABC" part after a different object parsed the "3" part. This basically uses the grammar object as a global variable. It works in a single-core world, but it won't work in the many-core world we're headed for.
But now you're getting bored with me because it's silly ringing my hands over such a simple format. I bet you could parse this format in just 4 lines of code:
def parse_sized_binary(stream):
size_bytes, stream = stream.readUntil(":")
size = int(size_bytes)
bytes, stream = stream.read(size)
return bytes, stream
You're right, it's really simple, at least until you have to embed this little "sized binary language" inside of a bigger language. Perhaps, like me, you need to embed binary data inside of a bigger data structure, and so you need to define a serialization for things like ints, floats, lists, and hash tables. For that stuff, you can define a grammer for a language like JSON or YAML and hand it over to a grammar-based parser library. But now you need a way to tell the library, "OK, when you see my binary data, hand control over to my four lines of code, and then I'll hand control back". You might even say "You know, why don't you just let me hand you any function of the form stream -> (value, stream) and let me control the parsing for a while".
Congratulations, you just invented monadic parsing. The function you wrote, parse_sized_binary, of the form stream -> (bytes, stream), is a Parser Monad. Monadic parsing is just combining lots of these little parsers together. And so, it's incredibly easy to insert arbitrary parsing code, like parse_sized_binary, into any parser.
As a complete example, using Pysec, here's a full parser for a language just like bencode but with support for floats and unicode. Notice that our parse_size_binary is renamed to bencode_sized and is given a more functional definition.
from Pysec import Stub, until, lift, read, many_until, branch
bencode_any = Stub()
bencode_sized = until(":") >> lift(int) >> read
bencode_unsized = until("e")
bencode_list = many_until(bencode_any, "e")
bencode_any.set(branch({"s" : bencode_sized,
"u" : bencode_sized >> lift(lambda bytes : bytes.decode("utf8")),
"i" : bencode_unsized >> lift(int),
"f" : bencode_unsized >> lift(float),
"l" : bencode_list,
"d" : bencode_list >> lift(dict)}))
I think this is sort of like Greenspun's Tenth Rule. Any implementation of netstrings/sized-binary-data probably has an ad-hoc implementation of monadic parsing in it. Even my own non-monadic version of this parser (which I use for performance reasons) has an ad-hoc implementation of monadic parsing. I'm convinced that if you write a parser for a similar format, you'll implement/invent monadic parsing as well, even if you don't know it.But, sometimes you need something more light-weight for handling "events". For example, imagine you have some code listening for file changes in a particular directory. What you'd like to do is make an "event" that is "fired" whenever a file change is detected. When fired, there may be a "handler" or "listener" which is notified of the event. That "handler" is ultimately just some code which is executed when the event occurs.
A while ago, I wanted an event system like this for Python. I didn't see anything builtin or any library available, so I decided to write my own. I'd like to share what I created with you.
But first, I want to follow an example through other programming languages to give you an idea of what I was trying to accomplish and how it compares with what's out there. After that, I'll give you my implemenation in Python. Our example will be listening for changes on the file system. We want to keep the "watcher" code decoupled from the rest of the code, so we use events.
The best implementation of events that I've used is in C#, so we'll start there. In C# 1.0, our file watcher would looks something like this:
public delegate void FileChangeHandler(string source_path);
class FileWatcher
{
public event FileChangeHandler FileChanged;
public void WatchForChanges()
{
...
if(FileChanged != null)
{
FileChanged(source_path);
}
...
}
}
class FileChangeLogger
{
public void OnFileChanged(string source_path)
{
Console.WriteLine(String.Format("{0} changed.", source_path));
}
}
watcher = FileWatcher();
logger = FileChangeLogger();
watcher.FileChanged += new FileChangeHandler(logger.OnFileChanged);
watcher.WatchForChanges();
It's pretty nice. The best part is at the end, where you can write "watcher.FileChange += ...". But, you need to type the completely useless "new FileChangeHandler", and you also need to wrap it all in a separate FileChangeLogger class. Luckily, in C# 2.0, they added Anonymous Delegates, which makes this much nicer:
watcher.FileChanged += delegate(string source_path)
{
Console.WriteLine(String.Format("{0} changed.", source_path));
}
And in C# 3.0, they've made it even nicer!
watcher.FileChanged += (source_path => Console.WriteLine(String.Format("{0} changed.", source_path)));
C# 3.0 has an event system that's downright slick, with type-inferencing and everything. I don't think it gets much better than that.
Actually, there is one thing. If there's no handler registered with the event, calling "FileChanged()" will blow up because it sees FileChanged == null until a handler is registered. This means that you have to write "if(Event != null){Event(...):}" every single time you fire the event. Every single time. If you forget, your code will seem to work fine until there's no handler, in which case it will blow up and you'll smack your forhead because you forgot that you have to repeat that line of code every single time you fire an event. I really mean every single time. It's by far the worst wart on an otherwise elgant system. I have no idea why the designers of C# thought this was a good idea. When would you ever want to fire and event and have it blow up in your face?
Anyway, let's try a different programming language, perhaps Java. It has the worst implementation of events I've ever seen. Here's our FileWatcher example:
...
...
Ok, nevermind, I don't have the heart. I can imagine the code full of IListeners, and implementations, and keeping an array of them, and iterating over them, etc, and I just can't do it. I got seriously upset at one useless line in C#. In Java, it's at least 10 times worse. If you really have the stomach for it, go look at http://java.sun.com/docs/books/tutorial/uiswing/events/index.html. To me, it appears that in Java, events are one giant hack around the lack of first-class functions. If Java had first-class functions, none of that nonsense would be necessary.
Now that we've seen a good implementation of events in C# and avoided a bad one in Java, let's make one for Python. I'd rather it be like the C# event system, so let's see what our example would look like:
from Event import Event
class FileWatcher:
def __init__(self):
self.fileChanged = Event()
def watchFiles(self):
...
self.fileChanged(source_path)
...
def log_file_change(source_path):
print "%r changed." % (source_path,)
watcher = FileWatcher()
watcher.fileChanged += log_file_change
I think that looks pretty good. So what does the implementation of Event look like?
class Event(IEvent):
def __init__(self):
self.handlers = set()
def handle(self, handler):
self.handler.add(handler)
return self
def unhandle(self, handler):
try:
self.handlers.remove(handler)
except:
raise ValueError("Handler is not handling this event, so cannot unhandle it.")
return self
def fire(self, *args, **kargs):
for handler in self.handlers:
handler(*args, **kargs)
def getHandlerCount(self):
return len(self.handlers)
__iadd__ = handle
__isub__ = unhandle
__call__ = fire
__len__ = getHandlerCount
Wow. That was pretty short. Actually, this is one of the reasons I love Python. If the language doesn't have a feature, we can probably add it. We just added one of C#'s best features to Python in 26 lines of code. More importantly, we now have a nice, light-weight, easy-to-use event system for Python.
For all of you how like full examples that you can cut and paste, here is one that you can run. Enjoy!
class Event:
def __init__(self):
self.handlers = set()
def handle(self, handler):
self.handlers.add(handler)
return self
def unhandle(self, handler):
try:
self.handlers.remove(handler)
except:
raise ValueError("Handler is not handling this event, so cannot unhandle it.")
return self
def fire(self, *args, **kargs):
for handler in self.handlers:
handler(*args, **kargs)
def getHandlerCount(self):
return len(self.handlers)
__iadd__ = handle
__isub__ = unhandle
__call__ = fire
__len__ = getHandlerCount
class MockFileWatcher:
def __init__(self):
self.fileChanged = Event()
def watchFiles(self):
source_path = "foo"
self.fileChanged(source_path)
def log_file_change(source_path):
print "%r changed." % (source_path,)
def log_file_change2(source_path):
print "%r changed!" % (source_path,)
watcher = MockFileWatcher()
watcher.fileChanged += log_file_change2
watcher.fileChanged += log_file_change
watcher.fileChanged -= log_file_change2
watcher.watchFiles()
The basic idea of my framework is to be and simple and easy to use as possible. There's no Erlang-style pattern matching or process linking. There isn't even any built-in error handling. But, there's a lot of support for the typical "react loop" style of message handling.
The architecture is very simple. There's one thread per actor. An actor is just an object you can send a message to. The actor receives the messages in the order they were sent. You always know the sender of the message; it's built right in. There's also "Bridge" that allows you to interact with actors from non-actor code.
The only tricky thing you'll see is a "handles" decorator. This is used to help you identify how you want an actor to handle a given message type. It helps you avoid re-creating a message loop and dispatcher yourself for every actor.
I've included a little example of how to use the framework. It obviously quite simple, but keep in mind that I used little more for the foundation of a rather complex file synchronization application. A little bit goes a long way.
So, here it is. Cut, paste, and run.
from threading import Thread, Event
from Queue import Queue, Empty
class IActor:
pass
# send(message, sender)
class HandlerRegistry(dict):
# should be used as a decorator
def __call__(self, typ):
def register(func):
self[typ] = func
return func
return register
OtherMessage = "Other Message"
class Stop(Exception):
def __repr__(self):
return "Stop()"
class Stopped:
def __repr__(self):
return "Stopped()"
class ActorNotStartedError(Exception):
def __init__(self):
Exception.__init__(self, "actor not started")
class Actor(IActor):
@classmethod
def spawn(cls, *args, **kargs):
self = cls(*args, **kargs)
self.mailbox = Mailbox()
start_thread(target = self.act, as_daemon = True)
return self
def send(self, message, sender):
if self.mailbox is None:
raise ActorNotStartedError()
else:
self.mailbox.send(message, sender)
def receive(self):
if self.mailbox is None:
raise ActorNotStartedError()
else:
return self.mailbox.receive()
# override if necessary
def act(self):
self.handleMessages()
handles = HandlerRegistry()
@classmethod
def makeHandles(*classes):
return HandlerRegistry((typ, handler) for cls in classes for (typ, handler) in cls.handles.iteritems())
def handleMessages(self):
try:
while True:
message, sender = self.receive()
self.handleMessageWithRegistry(message, sender)
except Stop:
pass
def handleMessageWithRegistry(self, message, sender):
registry = self.__class__.handles
handler = registry.get(message.__class__) or registry.get(OtherMessage)
if handler is not None:
handler(self, message, sender)
@handles(OtherMessage)
def onOther(self, message, sender):
pass
@handles(Stop)
def onStop(self, message, sender):
sender.send(Stopped(), self)
raise message
def start_thread(target, as_daemon, name = None):
thread = Thread(target = target)
if name:
thread.setName(name)
thread.setDaemon(as_daemon)
thread.start()
return thread
class Mailbox:
def __init__(self):
self.mailbox = Queue()
def send(self, message, sender):
self.mailbox.put((message, sender), block = False)
def receive(self, timeout = None):
return self.mailbox.get(block = True, timeout = timeout)
class Bridge(IActor):
def __init__(self):
self.mailbox = Mailbox()
def send(self, message, sender):
self.mailbox.send(message, sender)
def call(self, target, request, timeout, default = None):
self.sendRequest(target, request)
return self.receiveResponse(timeout, default)
# targeted_requests can be an iterator
def multiCall(self, targeted_requests, timeout, default = None):
count = 0
for target, request in targeted_requests:
self.sendRequest(target, request)
count += 1
for _ in xrange(count):
yield self.receiveResponse(timeout, default)
def stop(self, actors, timeout):
stop = Stop()
return list(self.multiCall(((actor, stop) for actor in actors), timeout, default = None))
def sendRequest(self, target, request):
target.send(request, self)
def receiveResponse(self, timeout, default):
try:
message, sender = self.mailbox.receive(timeout = timeout)
return message
except Empty:
return default
if __name__ == "__main__":
import time
class GetInventory:
pass
class Task:
def __init__(self, input, destination):
self.input = input
self.destination = destination
class Worker(Actor):
handles = Actor.makeHandles()
def __init__(self, skill):
self.skill = skill
@handles(Task)
def onTask(self, task, sender):
output = self.skill(task.input)
task.destination.send(output, self)
class Warehouse(Actor):
handles = Actor.makeHandles()
def __init__(self):
self.inventory = []
@handles(GetInventory)
def onGetInventory(self, message, sender):
# copy the inventory to avoid anyone mutating it
sender.send(list(self.inventory), self)
@handles(OtherMessage)
def onTaskResult(self, result, sender):
self.inventory.append(result)
worker = Worker.spawn(lambda x : x * 2)
positives = Warehouse.spawn()
negatives = Warehouse.spawn()
bridge = Bridge()
for val in [1, 2, 3, -2, -4, -6]:
warehouse = positives if val >= 0 else negatives
worker.send(Task(val, warehouse), sender = None)
print bridge.call(positives, GetInventory(), 1.0) #should be [ 2, 4, 6]
print bridge.call(negatives, GetInventory(), 1.0) #should be [-4, -8, -12]
print bridge.stop([worker, positives, negatives], 1.0) #should be [Stopped(), Stopped(), Stopped()]
class Start:
def __init__(self, target):
self.target = target
class Ping:
def __repr__(self):
return "Ping()"
class Pong:
def __repr__(self):
return "Pong()"
class Pinger(Actor):
handles = Actor.makeHandles()
@handles(Start)
def onStart(self, start, sender):
start.target.send(Ping(), self)
@handles(Pong)
def onPong(self, pong, sender):
print "-",
sender.send(Ping(), self)
class Ponger(Actor):
handles = Actor.makeHandles()
@handles(Ping)
def onPing(self, ping, sender):
print "+",
sender.send(Pong(), self)
# should print lots of +-+-+-
pinger = Pinger.spawn()
ponger = Ponger.spawn()
pinger.send(Start(ponger), sender = None)
time.sleep(0.1)
bridge.stop([pinger, ponger], 1.0)
Python seems to use a lot of memory. So what exactly is the overhead of each type of value? Short answer:
| int | 24 |
| float | 24 |
| tuple | 63 |
| list | 101 |
| dict | 298 |
| old-style class | 345 |
| new-style class | 336 |
| subclassed tuple | 79 |
| Record | 79 |
| Record with old class mixin | 79 |
| Record with new class mixin | 79 |
I measured these by running a simple program that loaded up 1,000,000 values in a list and then did time.sleep(1000). I ran that for different value types and then ran "top" to see how much memory was being used. I took that value, substracted the memory usage for a list of all the same value (14 bytes each), subtracted the value of a child value (usually an int, 24 bytes each), and then divided by 1,000,000. I'll include the code I ran at the end if you want to cut and paste. So what lessons do we learn from this?
- Python objects are very expensive at over 300 bytes each.
- Tuples have 1/5 as much overhead.
- Records are almost as good as tuples, even when a mixin is added.
So, if you want to have lots of values in memory without using lots of memory, use Record.
If you want to run the test for yourself, here's the code. Just comment out the "make_val" that you want to test.
import time
from Record import Record
class TupleClass(tuple):
pass
class RecordClass(Record("val")):
pass
class OldClass:
def __init__(self, val):
self.val = val
def method(self):
pass
class NewClass(object):
def __init__(self, val):
self.val = val
def method(self):
pass
class RecordWithOldClass(Record("val"), OldClass):
pass
class RecordWithNewClass(Record("val"), NewClass):
pass
make_val = lambda i : 1 #nothing (base overhead)
#make_val = lambda i : i
#make_val = float
#make_val = lambda i : (i,)
#make_val = lambda i : [i]
#make_val = lambda i : {i:i}
#make_val = TupleClass
#make_val = RecordClass
#make_val = OldClass
#make_val = NewClass
#make_val = RecordWithOldClass
#make_val = RecordWithNewClass
count = 1000000
lst = [make_val(i) for i in xrange(count)]
time.sleep(100000)
Decorators in python are very powerful, but are often a pain to get right, especially when you want to pass arguments to the decorator. Typically, it involves defining a function in a function in a function, etc, up to 4 layers deep. Can we make it easier? What if we even made a decorator to make decorators?
Perhaps we could use it catch and log errors:
@decorator
def log_error(func, *args, **kargs):
try:
return func(*args, **kargs)
except Exception, error:
print "error occurred: %r" % error
@log_error
def blowup():
raise Exception("blew up")
blowup() # this gets caught and prints the error
Or maybe we'd like to synchronize a function to make it thread-safe:
@decorator
def synchronized(lock, func, *args, **kargs):
lock.acquire()
try:
return func(self, *args, **kargs)
finally:
lock.release()
missle_lock = thread.RLock()
@synchronized(missle_lock)
def launch_missiles():
pass
Or we could even use arguments to do do some object __init__ trickery (something I've had to do when working with wxpython):
@decorator
def inherits_format(method, *args, **kargs):
self, parent = args[:2]
self.format = parent.format
return method(*args, **kargs)
class Child:
@inherits_format
def __init__(self, parent):
pass
class Parent:
def __init__(self, format):
self.format = format
format = object()
child = Child(Parent(format))
assert child.format is format
If you've never worked python decorators, these are few examples of how powerful they are. But you'll probably never want to write your own because it can be a pain. If that's the case, then the decorator decorator is for you! Here's the code for it. It's a little tricky, but all you have to do is import it, slap @decorator before your decorator, make sure you call func(*args, **kargs), and you're all set. This is as easy as decorators can get.
Update: Dave left a comment and notified me that there is another, much more complete, implementation of the same idea at http://www.phyast.pitt.edu/~micheles/python/documentation.html. So, if you want a very complete module, go there. If you want something small and simple to cut and paste into your own code, use the following.
def decorator(decorator_func):
decorator_expected_arg_count = decorator_func.func_code.co_argcount - 1
if decorator_expected_arg_count:
def decorator_maker(*decorator_args):
assert len(decorator_args) == decorator_expected_arg_count, "%s expects %d args" % (decorator_func.__name__, decorator_expected_arg_count)
def _decorator(func):
assert callable(func), "Decorator not given a function. Did you forget to give %r arguments?" % (decorator_func.__name__)
def decorated_func(*args, **kargs):
full_args = decorator_args + (func,) + args
return decorator_func(*full_args, **kargs)
decorated_func.__name__ = func.__name__
return decorated_func
return _decorator
return decorator_maker
else:
def _decorator(func):
def decorated_func(*args, **kargs):
return decorator_func(func, *args, **kargs)
decorated_func.__name__ = func.__name__
return decorated_func
return _decorator
Why the sudden epiphany? A few weeks ago, I built my self a new computer and put in an Intel quad-core CPU. My computer is much faster now, and I'm very happy with it, but after a few weeks I have realized something: I never use more than 1/4 of my CPU. I have a little graph on my screen at all times showing me the CPU usage. It goes up to 25% all of the time, but I have never seen it go higher. Ever.
On one hand this is a good thing. The new computer is so fast that everything I do is instant, and only uses a tiny bit of power. If it isn't instant, it's usually disk or network bound, not CPU bound.
On the other hand, even my own software isn't using more than 25%, and it is CPU bound at times. I'd like to fix it, but it's written in python, and the short version of a long, boring story is that python has a thing called The GIL which makes it so python as currently implemented cannot use more than 25% of the CPU except under very rare circumstances.
It seems that programming languages takes a long time to be adopted, but I think concurrency is a big enough rule changer to shake up which programming languages are dominant. In my specific case, if python doesn't fix it's concurrency problems soon, I'm going to have to stop considering it because I'll never be able to get it to use more than 1 little piece of the CPU. Right now, that's 25%, but in a few years, it will be only 3%. Either python is going to have to change, or I'm going to have to change programming languages (or use something like Jython or IronPython, I suppose).
A few weeks ago, sitting behind my single core computer, I was in the "someday, we'll have to tackle concurrency" camp. Now, sitting in front of my quad core machine, never using more than 25% of its power, everything has changed. Concurrency is no longer a question of if or when, it's here right now. If you don't believe me, get yourself a quad-core machine and watch the CPU usage graph. I think you'll be surprised.
DTrace is an incredible tool. It basically lets you do profiling of a live application with no performance penatly. I'm writing a Python that needed some profiling, and I found the "normal" techniques like the profile/cProfile module very lacking. Luckily, Mac OSX comes with DTrace and it even works with Python. The only snag is that it's hard to find how to use the darn thing. I finally figured it out, so I figured it pass on the knowledge.
So, here's how you use dtrace on your python application in Mac OSX:
- Get DTraceToolkit.
- Edit Python/py_cputime.d by replacing "function-entry" with "entry" and "function-return" with "exit".
- Call "sudo dtrace -s Python/py_cputime.d"
- Let it sit there a while and hit ctrl-c.
- Enjoy the results
I can only assume you have to edit the file because of some difference between Solaris and OSX. You can try files other than py_cputime.d, but you might have to edit them too. Not all of them work, but most do.
The last thing to know is that you have to use the python that comes with OSX. A custom-built python doesn't seem to work.
Hope that helps!







