Sunday, January 24, 2010

Objective-XML 5.3

New in this release:
  • Cocotron targets for Windows support.
  • XMLRPC support.
  • No longer uses 'private' API that was causing AppStore rejections for some iPhone apps using Objective-XML.
  • Support for numeric entitites.

Labels: , ,

Wednesday, December 9, 2009

Simple Objective-XML example

Many times now, I've been asked about more Objective-XML examples. Here's a very simple one. It is adapted from Marcus Zarra's very helpful libxml and xmlreader tutorial. That tutorial shows how to parse a very simple XML format using libxml2.

The XML file parsed is the following:

<?xml version="1.0" encoding="UTF-8"?>



    <name>John Doe</name>




    <name>Mary Doe</name>




    <name>John Smith</name>




It is parsed using at application startup using the following code:

- (void)applicationDidFinishLaunching:(NSNotification*)notification


NSString *path = [[NSBundle mainBundle] pathForResource:@"xmlExample" ofType:@"xml"];

NSData *xmlData = [NSData dataWithContentsOfFile:path];

xmlTextReaderPtr reader = xmlReaderForMemory([xmlData bytes],

[xmlData length], 

[path UTF8String], nil, 


if (!reader) {

NSLog(@"Failed to load xmlreader");



NSString *currentTagName = nil;

NSDictionary *currentPerson = nil;

NSString *currentTagValue = nil;

NSMutableArray *people = [NSMutableArray array];

char* temp;

while (true) {

if (!xmlTextReaderRead(reader)) break;

switch (xmlTextReaderNodeType(reader)) {


//We are starting an element

temp =  (char*)xmlTextReaderConstName(reader);

currentTagName = [NSString stringWithCString:temp


if ([currentTagName isEqualToString:@"person"]) {

currentPerson = [NSMutableDictionary dictionary];

[people addObject:currentPerson];




//The current tag has a text value, stick it into the current person

temp = (char*)xmlTextReaderConstValue(reader);

currentTagValue = [NSString stringWithCString:temp


if (!currentPerson) return;

[currentPerson setValue:currentTagValue forKey:currentTagName];

currentTagValue = nil;

currentTagName = nil;

default: continue;



NSLog(@"%@:%s Final data: %@", [self class], _cmd, people);

[self setRecords:people];


To parse it using MAX you need to add MPWXmlKit and MPWFoundation to your project, and then replace the code above with the following:

- (void)applicationDidFinishLaunching:(NSNotification*)notification


NSString *path = [[NSBundle mainBundle] pathForResource:@"xmlExample" ofType:@"xml"];

NSArray *people=[[MPWMAXParser parser] parsedDataFromURL:[NSURL fileURLWithPath:path]];

[self setRecords:people];


Labels: , ,

Tuesday, November 10, 2009

Blocked-C II

Damien Pollet thinks my comparison between Objective-C blocks and HOM is not completely fair:
… from my (Smalltalk) experience, the block passed to #collect: is often not a single message send, but rather a small adhoc expression, for which it does not really make sense to define a named method. Or you might need both the element and its key/index… how does HOM deal with that?
These are certainly valid observations, and were some of the reasons that I didn't really think that much of HOM for the first couple of years after coming up with it back in 1997 or so. Since then, I've become less and less convinced that the problems raised are a big concern, for a number of reasons.

Inline vs. Named

One reason is that I actually looked at usage of blocks in the Squeak image, and found that the majority of blocks with at least one argument (so not ifTrue:, whileTrue: and other control structures) actually did contain just a single message send, and so could be immediately expressed as HOMs. Second, I noticed that there were a lot of fairly large (3+ LOC) blocks that should have been separate methods but weren't. That's when I discovered that the presence of blocks actually encourages bad code, and the 'limitation' of HOMs actually was encouraging better(-factored) code.

Of course, I wasn't particularly convinced by that line of reasoning, because it smelled too much like "that's not a bug, that's a feature". Until that is, I saw others with less vested interest reporting the same observation:

But are these really limitations? After using higher order messages for a while I've come to think that they are not. The first limitation encourages you move logic that belongs to an object into that object's implementation instead of in the implementation of methods of other objects. The second limitation encourages you to represent application concepts as objects rather than procedural code. Both limitations have the surprising effect of guiding the code away from a procedural style towards better object-oriented design.
My experience has been that Nat is right, having a mechanism that pushes you towards factoring and naming is better for your code that one that pushes you towards inlining and anonymizing.

Objective-C I

In fact, the Cocoa example that Apple gives for blocks illustrates this idea very well. They implement a "Finder like" sorting mechanism using blocks:

static NSStringCompareOptions comparisonOptions = NSCaseInsensitiveSearch | NSNumericSearch |
        NSWidthInsensitiveSearch | NSForcedOrderingSearch;
NSLocale *currentLocale = [NSLocale currentLocale];
NSComparator finderSort = ^(id string1, id string2) {
    NSRange string1Range = NSMakeRange(0, [string1 length]);
    return [string1 compare:string2 options:comparisonOptions range:string1Range locale:currentLocale];
NSLog(@"finderSort: %@", [stringsArray sortedArrayUsingComparator:finderSort]);

The block syntax is so verbose that there is no hope of actually defining the block inline, the supposed raison d'etre for blocks. So we actually need to take the block out-of-line and name it. So it looks suspiciously like an equivalent implementation using functions:

static NSStringCompareOptions comparisonOptions = NSCaseInsensitiveSearch | NSNumericSearch |
        NSWidthInsensitiveSearch | NSForcedOrderingSearch;
NSLocale *currentLocale = [NSLocale currentLocale];
static NSComparisonResult finderSort(id string1, id string2) {
    NSRange string1Range = NSMakeRange(0, [string1 length]);
    return [string1 compare:string2 options:comparisonOptions range:string1Range locale:currentLocale];
NSLog(@"finderSort: %@", [stringsArray sortedArrayUsingFunction:finderSort context:nil hint:nil]);

Of course, something as useful as a Finder-like comparison sort really deserves to be exposed and made available for reuse, rather than hidden inside one specific sort. Objective-C categories are just the mechanism for this sort of thing:

@implementation NSString(finderCompare)
-(NSSComparisonResult)finderCompare:(NSString*)string2) {
    NSRange myRange = NSMakeRange(0, [self length]);
    return [self compare:string2 options: NSCaseInsensitiveSearch | NSNumericSearch |
        NSWidthInsensitiveSearch | NSForcedOrderingSearch range:string1Range locale:[NSLocale currentLocale]];
NSLog(@"finderSort: %@", [stringsArray sortedArrayUsingSelector:@selector(finderCompare:)]);

Note that some of these criticisms are specific to Apple's implementation of blocks, they do not apply in the same way to Smalltalk blocks, which are a lot less noisy.

Objective-C II

Objective-C has at least one other pertinent difference from Smalltalk, which is that it already contains control structures in the basic language, without blocks. (Of course, those control structures can also take blocks as arguments, but these are the different types of blocks that are delimited by curly braces and cannot be passed around as first class objects).

This means that in Objective-C, we already have the ability to do all the iterating we need, mechanisms such as blocks and HOM are mostly conveniences, not required building blocks. If we need indices, use a for loop. If we require keys, use a key-enumerator and iterate over that.

In fact, I remember when my then colleagues started working with a enum-filters, a HOM-precursor that's strikingly similar to the Google Toolbox's GTMSEnumerator+Filter.m. They really took to the elegance, but then also wanted to use it for various special cases. They laughed when they realized that those special-cases were actually already handled better by existing C control structures such as for-loops.

FP, HANDs and Aggregate Operations

While my dislike of blocks is easy to discount by the usual inventor's pride (your child must be ugly for mine to be pretty), that interpretation actually reverses the causation: I came up with HOM because I was never very fond of blocks. In fact, when I first encountered Smalltalk during my university years I was enthralled until I saw the iteration methods.

That's not to say that do:, collect: and friends were not light-years ahead of Algol-type control structures, they most definitely were and still are. Having some sort of higher-order mechanism is vastly superior than not having a higher-order mechanism. I do wish that "higher order mechanism" and "blocks" weren't used as synonyms quite as much, because they are not, in fact, synonymous.

When I first encountered Smalltalk blocks, I had just previously been exposed to Backus's FP, and that was just so much prettier! In FP functions are composed using functionals without ever talking about actual data, and certainly without talking about individual elements. I have always been on the lookout for higher levels of expression, and this was such a higher level. Now taking things down to "here's another element, what do you want to do with that" was definitely a step back, and quite frankly a bit of a let-down.

The fundamental difference I see is that in Smalltalk there is still an iteration, even if it is encapsulated: we iterate over some collection and then execute some code for each element. In FP, and in HOM, there is instead an aggregate operation: we take an existing operation and lift it up as applying to an entire collection.

This difference might seem contrived, but the research done with the HANDS system demonstrates that it is very real:

After creating HANDS, I conducted another user study to examine the effectiveness of three features of HANDS: queries, aggregate operations, and data visibility. HANDS was compared with a limited version that lacked these features. In the limited version, programmers were able to achieve the desired results but had to use more traditional programming techniques. Children using the full-featured HANDS system performed significantly better than their peers who used the limited version.
I also find this difference to be very real.

The difference between iterating with blocks and lifting operations to be aggregate operations also shows up in the fact that the lifting can be done on any combination of the involved parameters, whereas you tend to only iterate over one collection at a time, because the collection and the iteration are in focus.


Finally, the comparison to functional languages shows a couple of interesting asymmetries: in a functional language, higher order functions can be applied both to named functions and to anonymous functions. In essence, the higher order mechanism just takes functions and doesn't care wether they are named or not. Also the higher order mechanism uses the same mechanisms (functions) as the base system,

With block-based higher order mechanisms, on the other hand, we must make the argument an anonymous function (that's what a block is), and we cannot use a named function, bringing us back to the conundrum mentioned at the start that this mechanisms encourages bad code. Not only that, it also turns out that the base mechanism (messages and methods) is different from the higher order mechanism, which requires anonymous functions, rather than methods.

HOM currently solves only the latter part of this asymmetry, making the higher order mechanism the same as the base mechanism, that mechanism being messaging in both cases. However, it currently cannot solve the other asymmetry: where blocks support unnamed, inline code and not named code, HOM supports named but not unnamed code. While I think that this is the better choice in the larger number of cases, it would be nice to actually suport both.

One solution to this problem might be to simply support both blocks and Higher Order Messaging, but it seems to me that the more elegant solution would be to support inline definition of more-or-less anonymous methods that could then be integrated into the Higher Order Messaging framework.

Labels: , , , ,

Thursday, November 5, 2009

Why not Objective-C?

Patrick Logan can't understand why projects use C++ rather than Ojective-C. Neither can I.

For the 95% (or more) of code that isn't performance sensitive, it gives you expressiveness very close to Smalltalk, and for the 5% or less that need high performance, it gets you the performance and predictability of C.

Labels: ,

Sunday, September 20, 2009

Cocoa(touch) memory management is as easy as 1-2-3

There is a common misconception that Cocoa memory management is hard. It's not.

  1. Use auto-generated accessors religiously
  2. Release your instance variables in dealloc
  3. Always use convenience methods to create objects
Wow, that wasn't too hard!

Labels: , ,

Sunday, February 8, 2009


Just pushed out a minor bugfix release to Objective-XML-5.0:
  • Re-enabled character-set conversion code that had gotten disabled
  • Fixed a compile-error for some targets
  • Other minor improvements
Download here:

Labels: , ,

Sunday, January 25, 2009

Objective-XML 5.0

I've just pushed out a new release of Objective-XML, with some pretty significant new features.

Incremental parsing

This feature, which was already discussed a little in an earlier post, is now available in an official release. In short, Objective-XML will now stream data from network data sources (specified by URL) and produce results incrementally, rather than reading all of the data first and then parsing it. This can make a huge difference in responsiveness and perceived performance for slow networks. CPU and memory consumption will be slightly higher because of extra buffering and buffer stitching required, so this should only be used when necessary.

Static iPhone library

Although Objective-XML has always been compatible with the iPhone, previous releases required copying the pre-requisite files into your project. This burden has now been eased by the inclusion of a static library target. You still need to copy the headers, either MPWMAXParser.h or MPWXmlParser.h (or both).

Unique keys

Previous releases of Objective-XML had an -objectForTag:(int)tag method for quickly retrieving attribute or element values.

enum songtags {
  item_tag=10, title_tag, category_tag	
  [parser setHandler:self forElements:[NSArray arrayWithObjects:@"item",@"title",@"category",nil]
          inNamespace:nil prefix:@"" map:nil tagBase:item_tag];
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
   [song setTitle:[children objectForTag:title_tag]];

Objective-XML adds an -objectForUniqueKey:aKey method that removes the need for these additional integer tags.
  [parser setHandler:self forElements:[NSArray arrayWithObjects:@"item",@"title",@"category",nil]
          inNamespace:nil prefix:@"" map:nil];
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
   [song setTitle:[children objectForUniqueKey:@"title"]];

In addition to providing faster access, the integer tags also served to disambiguate tag names that might occur in multiple namespaces. To handle these conflicts, there now is a -objectForUniqueKey:aKey namespace:aNamespace method. The namespace objects required for this disambiguation process are now returned by the -setHandler:... and -declareAttributes:... methods, which were previously void.

Default methods

One of the attractive features of DOM parsers is that they do something useful "out of the box": point a DOM parser at some XML and you get back a generic in-memory representation of that XML that you can then start taking apart. However, once you go down that road, you are stuck with the substantial CPU and memory overheads of that generic representation.

Streaming parser like SAX or MAX can be a lot more efficient, but it takes a lot more time and effort until achieving a first useful result. Default methods overcome this hurdle by also delivering an immediately useful generic representation without any extra work. Unlike a DOM, however, this generic representation can be incrementally replaced by more specialized and efficient processing later on.

Labels: , ,

Tuesday, January 20, 2009

Cocoa HTML parsing with Objective-XML

Although Objective-XML's MPWSAXParser mostly provides NSXMLParser compatibility it also provides a number of useful additional features. Among these features is the ability to parse HTML files via the settings of two flags: enforceTagNesting and ignoreCase. By default, these are on and off, respectively, which gives you strict XML behavior. However, by setting enforceTagNesting to NO and ignoreCase to YES, you get a SAX parser that will happily and speedily process HTML.

Labels: , ,

Saturday, January 17, 2009

Semantic Noise

Martin Fowler and Gilad Bracha write about Syntactic Noise, making similar points and using similar typographical techniques as I did in my HOM paper.
By Syntactic Noise, what people mean is extraneous characters that aren't part of what we really need to say, but are there to satisfy the language definition. Noise characters are bad because they obscure the meaning of our program, forcing us to puzzle out what it's doing.
Couldn't have said it better myself, so I'll just quote Martin Fowler. Syntactic noise is one of the reasons I think neither the for(each) statement nor the blocks added to Objective-C are particularly good replacements for Higher Order Messaging.
newArray = [existingArray map:^(id obj){ return [obj  stringByAppendingString:@"suffix"]; }];
newArray = [[existingArray map] stringByAppendingString:@"suffix"]];

To me, that extra syntax is quite noisy, though the noise isn't, in fact, just syntactic. We also have to introduce, name and even correctly type a completely redundant stand-in (obj) that we don't really care about. Introducing extra entities is semantic noise. Apart from having to puzzle out what that extra entity is (and that it is, in fact, redundant) every time we read the code, it also brings us back to "element at a time" programming and thinking.

Labels: , ,

Thursday, January 15, 2009

Simple HOM

While it is good to see that Higher Order Messaging is still inspiring new work, I feel a bit guilty that part of that inspiration are sentiments such as the following:

"Still I have yet to find a simple implementation that I like and that does not use private methods. The last thing I want is a relying on classes which can break at any time."
Mea culpa.

While I did explain a bit why the current HOM implementation is a bit gnarly, code probably speaks more loudly than repeated mea-culpas.

So, without further ado, a really simple HOM implementation. An NSArray category provides the interface and does the actual processing:

@interface NSArray(hom)



@implementation NSArray(hom)

-(NSArray* )collect:(NSInvocation*)anInvocation
  NSMutableArray *resultArray=[NSMutableArray array];
  for (id obj in self ) {
    id resultObject;
    [anInvocation invokeWithTarget:obj];
    [anInvocation getReturnValue:&resultObject];
    [resultArray addObject:resultObject];
  return resultArray;

-collect {
  return [HOM homWithTarget:self selector:@selector(collect:)];

The fact that NSInvocation deals with pointers to values rather than values makes this a bit longer than it needs to be, but the gist is simple enough: iterate over the array, invoke the invocation, return the result.

That leaves the actual trampoline, which is really just an implementation detail for conveniently creating NSInvocation objects.

@interface HOM : NSProxy {
  id xxTarget;
  SEL xxSelector;


@implementation HOM

  [xxTarget performSelector:xxSelector withObject:anInvocation];

  return [[xxTarget objectAtIndex:0] methodSignatureForSelector:aSelector];

-xxinitWithTarget:aTarget selector:(SEL)newSelector
  return self;

+homWithTarget:aTarget selector:(SEL)newSelector
  return [[[self alloc] xxinitWithTarget:aTarget selector:newSelector] autorelease];

This code compiles without warnings, does not use any private API, and runs on both Leopard and the iPhone. The Xcode project can be downloaded here.

Labels: ,

Sunday, January 11, 2009

iPhone XML performance

Shortly after becoming an iPhone developer, I found a clever little piece of example code called XML Performance (login required). Having done some high performance XML processing code that works on the iPhone, I was naturally intrigued.

The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.

More responsiveness using libxml2 instead of NSXMLParser

Based on my previous experience, I was expecting libxml2 to be noticeably faster, but with the advantage in processing speed being less and less important with lower and lower I/O data rates (WiFi to 3G to Edge), as I/O would start to completely overwhelm processing. Was I ever wrong!

While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.

The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.

Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.

A simpler way: Objective-XML's SAX

Assuming you have already written a Cocoa-(Touch-)based parser using NSXMLParser, all you need to do is include Objective-XML in your projects and replace the reference to NSXMLParser with a reference to MPWSAXParser, everything else will work just as before. Well, the same except for being significantly faster (even faster than libxml2) and now also more responsive on slow connections due to incremental processing.

I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.

So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.

Even simpler: Messaging API for XML (MAX)

However, even a Cocoa version of the SAX API represents a pretty low-bar in terms of ease of coding. With MAX, Objective-XML provides an API that can do the same job much more simply. MAX naturally integrates XML processing with Objective-C messaging using the following two main features:
  • Clients get sent element-specific messages for processing
  • The parser handles nesting, controlled by the client
The following code for building Song objects out of iTunes <item> elements illustrates these two features:
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
  Song *song=[[Song alloc] init];
  [song setArtist:[children objectForTag:artist_tag]];
  [song setAlbum:[children objectForTag:album_tag]];
  [song setTitle:[children objectForTag:title_tag]];
  [song setCategory:[children objectForTag:category_tag]];
  [song setReleaseDate:[parseFormatter dateFromString:[children objectForTag:releasedate_tag]]];
  [self parsedSong:song];
  [song release];
  return nil;
MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or manage partial state as in a SAX parser. The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.

Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:

-defaultElement:children attributes:atrs parser:parser
	return [[children combinedText] retain];
Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.

These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.

In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.

Summary and Conclusion

The slightly misnamed XML Performance sample code for the iPhone demonstrates how important managing latency is for perceived end user performance, while showing only very little in terms of actual XML processing performance.

While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.

Labels: , ,

Saturday, December 20, 2008

Unit test the class

Travis Griggs comes to the conclusion that unit test objects should map 1:1 to classes under test.

I agree.

In fact, I would go a bit further: tests should be an integral part of a class. While this helps avoid negative outcomes such as parallel class hierarchies or having code and tests diverge, it more importantly simplifies the test/code relationship and drives home the point that code is incomplete without its tests.

While I was working with JUnit on a reasonably large Java system, both finding a good place for a particular test and finding the tests for a specific class became quite burdensome after a while.

For this reason MPWTest simply asks classes to test themselves. Furthermore, only frameworks are tested, so the test tool simply loads each framework to test, enumerates the classes within that particular framework and then runs the tests it finds. TestCases and TestSuites are implicitly created from this structure, removing most of the administrative burdens of unit testing, and also any explicit dependence of the tests on the testing framework.

Having no dependencies on the testing framework makes it easier to ship tests in production code without having to also ship the testing framework. While this may sound odd at first, it avoids potential issues with code compiled for testing being different than code destined to be shipped, and further reinforces the idea that tests are an integral part of each class, rather than an optional add-on.

Labels: ,

Tuesday, September 25, 2007

Objective-C future(s)

Via LtU, I got alerted to the fact that theEtoile project now has an implementation of futures. Cool.

However, their implementation has specific objects reacting asynchronously to messages, making it more similar to the actor model,which as they mention is also very much Alan Kay's original conceptual model for Smalltalk:

Bob Barton, the main designer of the B5000 and a professor at Utah had said in one of his talks a few days earlier: "The basic principle of recursive design is to make the parts have the same power as the whole." For the first time I thought of the whole as the entire computer and wondered why anyone would want to divide it up into weaker things called data structures and procedures. Why not divide it up into little computers, as time sharing was starting to? But not in dozens. Why not thousands of them, each simulating a useful structure? [Emphasis mine]
Actors are inherently asynchronous, each actor runs in a separate process/thread and messages arealso asynchronous, with the sender not waiting for the message to be delivered or ever gettinga return value. Of course the actor model also makes all objects active, so the Etoile model, whichonly makes objects of specific classes active, is somewhere inbetween.

Futures, on the other hand, as introduced in MULTLSIP (pdf), tryto integrate asynchronous execution into a traditional call/return control- and data-flow. So messages(or functions in MULTILSIP) appear to have normal synchronous semantics and immediately yielda return value, but when annotated with the future keyword execution of that return valueis done in a background thread and the immediate return value is just a proxy for the value that is still being computed.

In the HOM paper (pdf) presented at OOPSLA 2005, I also describe a Future implementationbased on Higher Order Messaging that comes very close to the way it was done in MULTILSIP. A -futureHOM is all that is needed to indicate that you would like a result computed in a background thread:

  result = [anObject lengthyOperation:parameter];           //  synchronous
  result = [[anObject future] lengthyOperation:parameter];  //  asynchronous with future
I am probably biased, but this seems about as easy-to-use as possible,with all the nasty machinery (worker-queues, lockless FIFOs, etc.)hidden behind a single -future message.

Labels: ,

Saturday, September 1, 2007

More on MPWObjectCache

Now that I've motivated why an MPWObjectCache might be useful, let's go into some more detail as to how it actually works. To follow along, or if you'd rather just read the source code than my ramblings, MPWObjectCache is part of MPWFoundation, which can be downloaded here:

As I mentioned before, the algorithm for MPWObjectCache is quite simple: there is a circular buffer of object slots. We try to get pre-allocated objects from this circular buffer if possible. If we find an object in the cache and it is available for reuse, we just return it and have just saved the cost of allocation. Two things can prevent this happy state of affairs: (1) we don't have an object yet or (2) we cannot reuse the object because it is still in use. In both cases we will need to allocate a new object, but in the second case we also remove the old object from the cache position.
    id obj;
    if ( objIndex >= cacheSize ) {
    if ( obj==nil ||  [obj retainCount] > 1 ) {
        if ( obj!=nil ) {
            [obj release];
        obj = [[objClass alloc] init];
    return [[obj retain] autorelease];

This is what a naive implementation looks like. A couple of notes on the code:

  • objects must be reinitialized by the client (and reinitializable in the first place)
  • only one attempt is made to find an object
  • the retain/autorelease will prevent the cache from working unless a fairly tight autorelease pool regime is maintained
  • there are quite a few message sends
  • it's not what is used in production
The effectiveness of the cache obviously depends on your allocation patterns and the size of the object-cache. Larger caches take longer to be filled up before they start wrapping around with the potential for reuse, but smaller sizes can mean that the object will still be in use when we do wrap around. The actual implementation is very similar to the one presented above, except that it does a little more probing and uses IMP-caching for all the messages sent on the critical path. These optimizations ensure that object-caches are no slower than normal allocations even in worst-case situations such as every allocated object being retained. In addition the cache can also be set to not do the retain/autorelease, which is safe when you are pushing objects and have control over the cache:
 // cache is an ivar
 id obj=GETOBJECT(cache);
 // target does not have access to cache
 [target doSomethingWithObject:obj];
 // obj now either has an extra retain or can be reused
This pleasant property is a side effect of the decision to turn the object-cache into an object that can be instantiated and placed in an instance variable, rather than the typical object pools that are implemented as class methods. The class method that maintains such a pool usually has no information about the lifetime of objects, so to be safe such an implementation always has to protect the objects it returns, negating much of the advantage of caching. Similar caveats apply to multi-threading and locking. Those caveats notwithstanding, MPWObjectCache also provides the CACHING_ALLOC macro for creating class-side allocation methods backed by an object cache, which is used in the HOM implementation to reduce the cost of allocating trampolines:
 CACHING_ALLOC( quickTrampoline, 5, YES )
This creates a +quickTramplone method backed by an object cache with 5 entries. The YES flag allows objects to be returned from the cache without the retain/autorelease despite the fact that it isn't one of the safe "push" patterns described above. However, this use is also safe because the trampoline is used only temporarily to catch and forward the message, all of which is code controlled by the implementation. It is no longer needed once any client code implementing the actual HOM is run. So, this is how and why object-caches can make your (temporary) object allocations much, much faster.

Labels: ,

Monday, August 27, 2007

High performance Objective-C I: a Postscript interpreter

A key component of the metaobject product suite is EGOS, which includes as a central ingredient a custom Postscript Level 3 compatible interpreter. The project was started in part as a hedge against the chance of Apple dropping DisplayPostscript, in part because our Postscript virtualization technique was hitting limits, and in part because it would make getting Objective-C objects out of the interpreter much easier.

At its core, Postscript is a stack-oriented, dynamically typed and highly polymorphic interpreted programming language. So implementing Postscript with Objective-C objects is actually not just convenient when you want to get Objective-C objects out, it is also a good match for the semantics of the language.

So all is good, right? Well, we also need to make sure that performance is competitive, otherwise there really isn't much of a point. How do we find out if performance is competitive? Fortunately, we have the gold standard handily available: Adobe's interpreter was not just used in NeXT's DisplayPostscript, but is also available as the PS Normalizer on Mac OS X . So let's test performance with a little Postscript program:

  0 1 1000000 { 4 mul pop } bind for
  usertime exch sub dup ==
  20 20 moveto /Times-Roman 24 selectfont
  100 string cvs show ( ms) show
The program times a loop that multiplies some numbers one million times. It exercises a good deal of the basic execution machinery in the Postscript language: stack manipulation, procedure invocation, array access (a procedure is just an array with the executable bit set), looping and arithmetic. The loop is timed with the usertime command, which returns CPU time used in milliseconds.

This test clocks in at 513 ms (513 ns per iteration) in Preview, which isn't too shabby.

1. The problem

As proof of concept, let's code up some Objective-C equivalent of what the Postscript interpreter has to do in this loop. That should give us a good lower bound for the time taken (lower bound because there will be additional interpretation overhead, and Postscript semantics are slightly more complicated). We need a stack, some number objects and a bit of arithmetic. Easy:
 id startcounter=[NSNumber numberWithInt:0];
 id endcounter=[NSNumber numberWithInt:1000000];
 id counter=startcounter;
 id four=[NSNumber numberWithInt:4];
 while ( [counter intValue] < [endcounter intValue] ) {
  int intResult;
  id result;
  [stack addObject:counter];
  [stack addObject:four];
  intResult = [[stack lastObject] intValue] * [[stack objectAtIndex:[stack count]-2] intValue];
  result=[NSNumber numberWithInt:intResult];
  [stack removeLastObject];
  [stack removeLastObject];
  [stack addObject:result];
  [stack removeLastObject];
  counter=[NSNumber numberWithInt:[counter intValue]+1];
Sadly, this takes 4.8 µs per iteration, so our 'lower' bound is almost 10 times slower than our target, and that's without accounting for interpretation. Clearly not good enough. What if we get rid of all that silly stack manipulation code and use a plain C loop?
  id b=[NSNumber numberWithInt:4];
  for (i=0;i < 10000000;i++) {
  id a=[NSNumber numberWithInt:i];
  id c=[NSNumber numberWithInt:[a intValue] * [b intValue]];

2. Mutable State

Objective-C is an imperative object oriented language, meaning objects can change state. However, we have treated numbers as immutable value objects, requiring them to be recreated from scratch. Allocating objects tends to be around 25x more costly than an Objective-C message send, so what if we don't allocate new integer objects, but instead reuse an existing one and just change its value? It turns out we can't use NSNumber for this as it doesn't allow its value to be set, so we need a (trivial) wrapper class for a single integer.
   id b=[MPWInteger numberWithInt:4];
   id a=[MPWInteger numberWithInt:0];
   id c=[MPWInteger numberWithInt:0];
   for (i=0;i <10000000;i++) {
  [a setIntValue:i];
  [c setIntValue:[a intValue] * [b intValue]];

That's more like it: 50ns per iteration is 100x better than our first attempt and also 10x better than the target we're aiming for. So taking advantage of mutable state makes our basic plan possible, at least in principle. Of course, we now have to reintroduce the stack and add interpretation.

3. Save the planet

Alas, it turns out that the interpreter really does need fresh instances. While it will discard them quickly in most cases, it sometimes stores them away meaning we can't statically reuse objects the way we did above.

Instead, we need to figure out a way to recycle temporary objects so we can reuse them without spending a lot of time. The common way to do this is to keep a pool of objects from which requests for new MPWInteger instances are satisfied. However, due to the unpredictable nature of the interpreted code, we cannot use the explicit checkin/checkout policy such pools usually require.

Instead we make the pool a circular buffer and use the retain count to verify that an object can be reused. When we get to a position in the pool that has an object, we can reuse that object if the retain count is one, meaning that only the pool has a valid reference. If the retain count of the object is greater than one, someone other than the pool is holding on to the object and it cannot be reused (yet), so we need to get another instance.

This logic is encapsulated in the class MPWObjectCache, which can be used very similarly to a class (factory object) in creating new instances.

 MPWObjectCache* intCache=[[MPWObjectCache alloc] initWithCapacity:20 
        class:[MPWInteger class]];
 id b=[MPWInteger integer:5];
 for (i=0;i < 1000000;i++) {
  id a=GETOBJECT(intCache);
  id d=GETOBJECT(intCache);
  [a setIntValue:i];
  [d setIntValue:[a intValue] * [b intValue]];

This code runs in 100ns per iteration,  so we now have a solution that gives us new or safely recycled objects quickly enough to build on with the confidence the end result will perform acceptably.

4.  Results

Running the Postscript test program from the start of this post in PostView yields a result of 260ns per iteration, meaning that our Objective-C Postscript interpreter is almost twice as fast as Adobe's, at least on this particular workload.  While I wouldn't generalize this isolated result to say that EGOS is a faster interpreter, it clearly shows that it is at least competitive, which was the goal of the exercise.
The fact that it took a measly 20 KLOC illustrates the leverage Objective-C provides:  Ghostscript weighs in at around 250+ KLOC (without drivers).
5.  Conclusion
One of the things I've always liked about Objective-C is that it lets you have your cake and eat it, too:  great expressiveness to solve your problem effectively is always coupled with the ability to get down and dirty and get really great performance, without losing the structure of the original solution.
The most important factor to watch out for in terms of performance tends to be object allocation.  Controlling this factor with a transparent object-cache allowed us to get an overall performance improvement of around 10-20x in the case of a Postscript interpreter, taking performance from unacceptably slow up to and beyond the industry standard.
Of course, this isn't the only factor and Postscript interpretation not the only application.  Stay tuned!

Labels: , ,