There are two kinds of intelligence: finding special cases, and finding generic answers. Today, I explain Xump, a new project to make a generic answer to messaging.
I've written a lot about messaging recently and one theme comes back again and again. There is, lurking somewhere below the mass of different ways of connecting applications, a universal answer.
It is a slow collective thought process that is taking years and many people. Most people will agree on certain attributes of an ideal messaging system: it is asynchronous, so pieces can all work at their own pace. It is abstracted, so pieces talk to logical addresses, not other pieces. It is symmetrical, so that any application can act both as service or client. And so on…
There is a lot of disagreement over transports. HTTP, AMQP, Comet, HyBi, RestMS. Since different transports reflect different network realities, it seems right that there be lots of choice, and competition.
And there is absolutely no agreement over resource models - the structures that are most important to application developers. Even a single protocol like AMQP has radically different models for versions 0.9, 0.10, and 1.0.
As the architect of two messaging products - OpenAMQ and Zyre - I've had to think about what lies below the transport. The latest draft of RestMS has a concept called "profiles" that lets designers add new semantics for feeds and pipes.
So I've been thinking about this: is there a universal model that lies beneath the different transport layers, and which can implement the many apparently different messaging models we're seeing? AMQP/0.9 has exchanges, queues, and bindings. AMQP/1.0 has queues and links. RSS, AtomPub has feeds. Comet has channels. The Java Messaging System (JMS) has destinations. And so on.
We learned several years ago that AMQP could implement JMS destinations. It looks like AMQP/1.0 can implement AMQP/0.9 semantics. AMQP can implement RSS, if one adds an HTTP transport layer. RestMS can implement AMQP.
So it looks like the more recent, more generic models can in fact implement the older, specific models. Logically, this suggests that there is a fully generic model which can implement any form of application messaging, both broker-based and peer-to-peer (which basically means putting a broker in every node, and defining elastic relationships between them).
What would such a model look like? To be honest, I don't exactly know, but parts of it seem clear:
- Conceptually we have a set of applications that speak to a central messaging broker, though there may be no broker, and applications may be threads, and 'speak' can be anything from a network connection to a shared memory queue.
- Applications publish messages to named shared resources, which are a form of queue. The semantics for publishing are extensible because they depend on external choices.
- Queues are stored in some fashion. The semantics for storage are extensible because there are cost/performance tradeoffs which we want to make available to the application architect.
- Queues deliver their messages to applications. The semantics for delivery are extensible because as for publishing, these depend on external choices. For example, do we deliver messages one by one (pedantic, safe, and slow), or as a stream (faster but more risky)?
- Queues may route messages into other queues according to application-specified subscription criteria that we will call selectors. Selectors say, "when a message matches these criteria, perform such-and-such operation".
- The criteria for selectors are generally but not only, matching on message address. There are many ways of matching: literal comparison, topic patterns, regular expressions, numeric ranges, Cartesian coordinates, XML paths, and so on.
- For O(log n) performance, selector matching is done collectively, as one operation on all selectors for a queue. This is possible (as we showed in 2006 for AMQP) for selectors that work on address patterns, where the set of different possible addresses is limited (to thousands).
- Selectors can also filter messages one by one, an operation with O(n2) cost. Here we do not need to precompute indices but can delegate the entire matching operation to an extension. This is how we do arbitrary content-based routing.
- The two main selector operations are move (where a message goes to only one application) and copy (where messages go to a set of applications).
- Ideally, we can mix selectors of any type on a queue and everything will just work. This is also the big change AMQP/1.0 makes over previous versions of AMQP.
- Messages are opaque binary contents with a textual envelope holding an address and other properties. Selectors work on the envelope, filters can work on the contents as well.
Here is the Parrot pattern for a set of recipients:
Sender
-> Queue
-> Selector address like "rec.pets.*" COPY
-> Recipient
-> Selector address like "rec.pets.dogs" COPY
-> Queue
-> Selector MOVE
-> Recipient
-> Selector address like "rec.pets.cats" COPY
-> Recipient
Here is the Wolfpack pattern for a set of recipients:
Sender
-> Queue
-> Selector address EQ "wolf" COPY
-> Queue
-> Selector MOVE
-> Recipient
-> Selector MOVE
-> Recipient
-> Selector MOVE
-> Recipient
That's it, though I'll be surprised if this is a final breakdown.
Let's think about an implementation. Do we want to expose the above model to the user of the messaging system, i.e. the application programmer? I think the answer is "no", because it is (a) too complex and (b) too raw. Application programmers need simplicity, and the above model of queues, selectors, and filters is too abstract, and thus too complex.
This is the same concern I have with the AMQP/1.0 design. It is a better model for generic messaging than AMQP/0.9. But it seems unnecessary to expose that model to application developers. Lacking from AMQP/1.0 is a set of higher-level models that are easier to use. Like RestMS profiles, or like the patterns I described on the www.restms.org wiki.
So this universal model is something to be embedded in a messaging product, it is not a deliverable messaging model itself. Just as physics engines and graphics engines let game designers draw realistic explosions, so a messaging engine should solve the core problems of a messaging product.
Which brings me to Xump, which is my new project to build a messaging engine based on the above very rough model design.
If Xump is buildable, it will make it simple to add new semantics for storage and matching. It will also make it simple to build a new messaging broker like OpenAMQ/2, or Zyre, which will consist of:
- A transport layer
- A configuration layer
- An administration layer
- The embedded messaging engine
- Extensions that implement storage and matching
- Implementations for the product's specific messaging models
Zyre profiles, for example, would be little applications that use the messaging engine in specific ways. AMQP/0.9.1's exchanges and queues, similarly, are straight-forward to implement on top of Xump and I expect AMQP/1.0 will also be doable. And RSS, AtomPub, and so on.
I'm building Xump using the iMatix Base/2 tools, which are the multi-threaded framework we use for OpenAMQ, Zyre and the X5 web server. Base/2 is nice but a steep learning curve. Sorry about that. At this stage, the goal is to prove the model design, so we can document it and make it easy to re-implement. There is no reason a Python engine should be computationally slower than one written in C.
Follow this on http://github.com/pieterh/xump/tree/master github], and on xump.org where I'll document the model as it starts to work.
I've been thinking about this kind of unification strategy myself today, particularly the idea of describing the graph of queues as chains of filters with copy/move transfer operations. However when it comes down to it, application developers mainly use of a few common communication patterns. Rather than building a general model to support every case from one primitive it may be interesting to explore how a composite model might be formed from the primitive kernel of each variation. Such a composition of specialized communication patterns might be more constrained and be easier to work with.
I like the RestMS decomposition for the most part but I don't like the fact that feeds, joins and pipes are not completely orthogonal. It's a bit unexpected that certain join types only make sense when combined with certain feed types. Why do we need feed types anyways? Why not envision feeds only as message entry points and associate them loosely with pipes by way of an intermediary join graph of orthogonal filter / distribution rules?
I like that you mentioned peer to peer. I'm in a situation right now where I am writing an application that initially uses a little local message broker for internal communication. Then it forks off other processes with their own message brokers and joins them to the message bus. On the side, it might also connect to well-known message brokers elsewhere on the network and join them to the bus also. To improve fault tolerance, message brokers could send presence advertisements when they join and leave the bus (or when their clients join and leave or when other resources are added or removed). I have not decided how to handle scoping yet since different participants should perhaps see different views of the feeds based on certain visibility rules.
I am a little disheartened by the fact that none of the extant general purpose messaging protocols that I can find are optimized for an ad-hoc peer to peer message bus. Most are designed for client / server around a central cluster of shared messaging resources. I believe peer to peer messaging and presence will be very important for pervasive message in the long term.
For an optimized peer-to-peer message bus, check out 0MQ, which does what you want, I think.
Peer-to-peer is very important, but it is also abstract in ways that make it hard to understand for many people. It seems hard for many developers to think of their apps as peers, it seems to be far easier to think in terms of client-server. This may change over time. The broker acts as a concrete object that helps them conceptualize and build sensible architectures.
RestMS's semantics are still evolving. I think the orthogonality will emerge over time, since the semantics are pluggable, and can thus be improved. It seems to me, today, that feed types are about persistence, that join types are about routing & filtering, and that pipe types are about persistence and delivery.
Xump is meant to act as an engine to develop and refine these notions. It's important for me that the resources actually exposed to the developer are high level, and not simply a network of queues. Thus, feeds and pipes, which may be internally similar but externally serve different goals.
So, as you say, we build constrained and high-level patterns from more generic low level primitives.
Portfolio