Objective-XML 5.3
- Cocotron targets for Windows support.
- XMLRPC support.
- No longer uses 'private' API that was causing AppStore rejections for some iPhone apps using Objective-XML.
- Support for numeric entitites.
Labels: Objective-C, performance, XML
Labels: Objective-C, performance, XML
The XML file parsed is the following:
It is parsed using at application startup using the following code:<?xml version="1.0" encoding="UTF-8"?>
<root>
<person>
<name>John Doe</name>
<age>14</age>
</person>
<person>
<name>Mary Doe</name>
<age>14</age>
</person>
<person>
<name>John Smith</name>
<age>15</age>
</person>
</root>
To parse it using MAX you need to add MPWXmlKit and MPWFoundation to your project, and then replace the code above with the following:- (void)applicationDidFinishLaunching:(NSNotification*)notification
{
NSString *path = [[NSBundle mainBundle] pathForResource:@"xmlExample" ofType:@"xml"];
NSData *xmlData = [NSData dataWithContentsOfFile:path];
xmlTextReaderPtr reader = xmlReaderForMemory([xmlData bytes],
[xmlData length],
[path UTF8String], nil,
(XML_PARSE_NOBLANKS | XML_PARSE_NOCDATA | XML_PARSE_NOERROR | XML_PARSE_NOWARNING));
if (!reader) {
NSLog(@"Failed to load xmlreader");
return;
}
NSString *currentTagName = nil;
NSDictionary *currentPerson = nil;
NSString *currentTagValue = nil;
NSMutableArray *people = [NSMutableArray array];
char* temp;
while (true) {
if (!xmlTextReaderRead(reader)) break;
switch (xmlTextReaderNodeType(reader)) {
case XML_READER_TYPE_ELEMENT:
//We are starting an element
temp = (char*)xmlTextReaderConstName(reader);
currentTagName = [NSString stringWithCString:temp
encoding:NSUTF8StringEncoding];
if ([currentTagName isEqualToString:@"person"]) {
currentPerson = [NSMutableDictionary dictionary];
[people addObject:currentPerson];
}
continue;
case XML_READER_TYPE_TEXT:
//The current tag has a text value, stick it into the current person
temp = (char*)xmlTextReaderConstValue(reader);
currentTagValue = [NSString stringWithCString:temp
encoding:NSUTF8StringEncoding];
if (!currentPerson) return;
[currentPerson setValue:currentTagValue forKey:currentTagName];
currentTagValue = nil;
currentTagName = nil;
default: continue;
}
}
NSLog(@"%@:%s Final data: %@", [self class], _cmd, people);
[self setRecords:people];
}
- (void)applicationDidFinishLaunching:(NSNotification*)notification
{
NSString *path = [[NSBundle mainBundle] pathForResource:@"xmlExample" ofType:@"xml"];
NSArray *people=[[MPWMAXParser parser] parsedDataFromURL:[NSURL fileURLWithPath:path]];
[self setRecords:people];
}
Labels: Cocoa, Objective-C, XML
Fixing that should hopefully just be ASMOP. The Weather Underground fortunately has some reasonably well-documented XML APIs, let's see what they have to offer and wether we can get to the data we want.
First, let's fire up the interactive Smalltalk Shell (stsh) and load the Objective-XML framework.
marcel@spock[Projects]stsh > context loadFramework:'MPWXmlKit'Next, let's have a look at the raw XML returned by the Weather Underground APIs.
> urlstr:='http://api.wunderground.com/weatherstation/WXCurrentObXML.asp?ID=KCADALYC1' > url:=NSURL URLWithString: urlstr > NSString stringWithContentsOfURL:url. >Hmm...no result. (It turns out that Weather Underground checks the user agent and errors if it doesn't find one. The various convenience methods do not send a User Agent). Maybe curl can help?
>context addExternalCommand:'curl'. >curl run:'http://api.wunderground.com/weatherstation/WXCurrentObXML.asp?ID=KCADALYC1' <?xml version="1.0"?> <current_observation> <credit>Weather Underground Personal Weather Station</credit> <credit_URL>http://wunderground.com/weatherstation/</credit_URL> <image> <url>http://icons.wunderground.com/graphics/bh-wui_logo.gif</url> <title>Weather Underground</title> <link>http://wunderground.com/weatherstation/</link> </image> <location> <full>Mussel Rock, Daly City, CA</full> <neighborhood>Mussel Rock</neighborhood> <city>Daly City</city> <state>CA</state> <zip></zip> <latitude>37.667347</latitude> <longitude>-122.489342</longitude> <elevation>514 ft</elevation> </location> <station_id>KCADALYC1</station_id> <station_type>Fan-aspirated Davis Vantage Pro 2 Plus</station_type> <observation_time>Last Updated on November 7, 1:55 PM PST</observation_time> <observation_time_rfc822>Sat, 07 November 2009 21:55:21 GMT</observation_time_rfc822> <weather></weather> <temperature_string>56.9 F (13.8 C)</temperature_string> <temp_f>56.9</temp_f> <temp_c>13.8</temp_c> <relative_humidity>83</relative_humidity> <wind_string>From the NW at 15.0 MPH Gusting to 16.0 MPH</wind_string> <wind_dir>NW</wind_dir> <wind_degrees>313</wind_degrees> <wind_mph>15.0</wind_mph> <wind_gust_mph>16.0</wind_gust_mph> <pressure_string>30.07" (1018.2 mb)</pressure_string> <pressure_mb>1018.2</pressure_mb> <pressure_in>30.07</pressure_in> <dewpoint_string>51.8 F (11.0 C)</dewpoint_string> <dewpoint_f>51.8</dewpoint_f> <dewpoint_c>11.0</dewpoint_c> <heat_index_string></heat_index_string> <heat_index_f></heat_index_f> <heat_index_c></heat_index_c> <windchill_string></windchill_string> <windchill_f></windchill_f> <windchill_c></windchill_c> <solar_radiation>483.00</solar_radiation> <UV>2.5</UV> <precip_1hr_string>0.00 in (0.0 mm)</precip_1hr_string> <precip_1hr_in>0.00</precip_1hr_in> <precip_1hr_metric>0.0</precip_1hr_metric> <precip_today_string>0.01 in (0.0 mm)</precip_today_string> <precip_today_in>0.01</precip_today_in> <precip_today_metric>0.0</precip_today_metric> <history_url>http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KCADALYC1</history_url> <ob_url>http://www.wunderground.com/cgi-bin/findweather/getForecast?query=37.667347,-122.489342</ob_url> </current_observation> <!-- 0.029:0 -->Much better. Now let's see if we can parse that XML data into a Cocoa Property List.
> parser := MPWMAXParser parser.
> parser parsedDataFromURL: 'http://api.wunderground.com/weatherstation/WXCurrentObXML.asp?ID=KCADALYC1'
{
UV = "2.5";
credit = "Weather Underground Personal Weather Station";
"credit_URL" = "http://wunderground.com/weatherstation/";
"dewpoint_c" = "11.1";
"dewpoint_f" = "51.9";
"dewpoint_string" = "51.9 F (11.1 C)";
"heat_index_c" = {
};
"heat_index_f" = {
};
"heat_index_string" = {
};
"history_url" = "http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KCADALYC1";
image = {
link = "http://wunderground.com/weatherstation/";
title = "Weather Underground";
url = "http://icons.wunderground.com/graphics/bh-wui_logo.gif";
};
location = {
city = "Daly City";
elevation = "514 ft";
full = "Mussel Rock, Daly City, CA";
latitude = "37.667347";
longitude = "-122.489342";
neighborhood = "Mussel Rock";
state = CA;
zip = {
};
};
"ob_url" = "http://www.wunderground.com/cgi-bin/findweather/getForecast?query=37.667347,-122.489342";
"observation_time" = "Last Updated on November 7, 1:55 PM PST";
"observation_time_rfc822" = "Sat, 07 November 2009 21:55:51 GMT";
"precip_1hr_in" = "0.00";
"precip_1hr_metric" = "0.0";
"precip_1hr_string" = "0.00 in (0.0 mm)";
"precip_today_in" = "0.01";
"precip_today_metric" = "0.0";
"precip_today_string" = "0.01 in (0.0 mm)";
"pressure_in" = "30.07";
"pressure_mb" = "1018.2";
"pressure_string" = "30.07\" (1018.2 mb)";
"relative_humidity" = 83;
"solar_radiation" = "482.00";
"station_id" = KCADALYC1;
"station_type" = "Fan-aspirated Davis Vantage Pro 2 Plus";
"temp_c" = "13.9";
"temp_f" = "57.0";
"temperature_string" = "57.0 F (13.9 C)";
weather = {
};
"wind_degrees" = 342;
"wind_dir" = NNW;
"wind_gust_mph" = "24.0";
"wind_mph" = "18.0";
"wind_string" = "From the NNW at 18.0 MPH Gusting to 24.0 MPH";
"windchill_c" = {
};
"windchill_f" = {
};
"windchill_string" = {
};
}
That looks good, we can see the wind information near the bottom of the output, with keys "wind_degrees" and "wind_mph". So let's grab the values for those keys using the collect Higher Order Message and -objectForKey:.
> (parser parsedDataFromURL:'http://api.wunderground.com/weatherstation/WXCurrentObXML.asp?ID=KCADALYC1' ) collect objectForKey: #( wind_mph wind_dir wind_string ) each. 21.0 NW From the NW at 21.0 MPH Gusting to 24.0 MPHAlmost what we wanted, except that we grabbed the wind direction as a string instead of the exact numeric direction. Easy fix:
> (parser parsedDataFromURL: 'http://api.wunderground.com/weatherstation/WXCurrentObXML.asp?ID=KCADALYC1' ) collect objectForKey: #( wind_mph wind_degrees wind_string ) each. 12.0 318 From the NW at 12.0 MPH Gusting to 24.0 MPH >Perfect. We have the wind speed, the direction and an informative text in case we want to display that.
Labels: Objective-Smalltalk, XML
Labels: Objective-C, performance, XML
enum songtags {
item_tag=10, title_tag, category_tag
};
...
[parser setHandler:self forElements:[NSArray arrayWithObjects:@"item",@"title",@"category",nil]
inNamespace:nil prefix:@"" map:nil tagBase:item_tag];
...
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
...
[song setTitle:[children objectForTag:title_tag]];
...
...
[parser setHandler:self forElements:[NSArray arrayWithObjects:@"item",@"title",@"category",nil]
inNamespace:nil prefix:@"" map:nil];
...
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
...
[song setTitle:[children objectForUniqueKey:@"title"]];
...
Streaming parser like SAX or MAX can be a lot more efficient, but it takes a lot more time and effort until achieving a first useful result. Default methods overcome this hurdle by also delivering an immediately useful generic representation without any extra work. Unlike a DOM, however, this generic representation can be incrementally replaced by more specialized and efficient processing later on.
Labels: Objective-C, performance, XML
Labels: Objective-C, performance, XML
The example pits Cocoa's NSXMLParser against a custom parser based on libxml2, the benchmark is downloading a top 300 list of songs from iTunes.
While my expectations were technically correct for overall performance, I had completely failed to take responsiveness into account. Depending on the network selected, the NSXMLParser sample would appear to hang for 3 to 50 seconds before starting to show results. Needless to say, that is an awful user experience. The libxml example, on the other hand, would start displaying some results almost immediately. While it also was a bit faster in the total time taken, this effect seemed pretty insignificant compared to the fact that results were arriving continually pretty much during the entire time.
The difference, of course, is incremental processing. Whereas NSXMLParser's -initWithContentsOfURL: method apparently downloads the entire document first and then begins processing, the libxml2-based code in the sample downloads the XML in small chunks and processes those chunks immediately.
Alas, going with libxml2 has clear and significant disadvantages, with the code that uses libxml2 being around twice the size of the NSXMLParser-based code, at around 150 lines (non-comment, non-whitespace). If you have worked with NSXMLParser before, you will know that that is already pretty painful, so just imagine that particular brand of joy doubled, with the 150 lines of code giving you the simplest of parsers, with just 5 tags processed. Fortunately, there is a simpler way.
I have to admit that not having incremental processing was a "feature" Objective-XML shared with NSXMLParser until very recently, due to my not taking into account the fact that latency lags bandwidth. This silly oversight has now been fixed, with both MPWMAXParser and MPWSAXParser sporting URL-based parsing methods that do incremental processing.
So that's all there is to it, Objective-XML provides a drop-in replacement for NSXMLParser that has all the performance and responsiveness-benefits of a libxml2-based solution without the coding horror.
-itemElement:(MPWXMLAttributes*)children attributes:(MPWXMLAttributes*)attributes parser:(MPWMAXParser*)p
{
Song *song=[[Song alloc] init];
[song setArtist:[children objectForTag:artist_tag]];
[song setAlbum:[children objectForTag:album_tag]];
[song setTitle:[children objectForTag:title_tag]];
[song setCategory:[children objectForTag:category_tag]];
[song setReleaseDate:[parseFormatter dateFromString:[children objectForTag:releasedate_tag]]];
[self parsedSong:song];
[song release];
return nil;
}
MAX sends the -itemElement:attributes:parser: message to its client whenever it has encountered a complete <item> element, so there is no need for the client to perform string processing on tag names or
manage partial state as in a SAX parser.
The method constructs a song object using data from the <item> element's child elements which it then passes directly to the rest of the app via the parsedSong: message. It does not return an value, so MAX will not build a tree at this level.Artist, album, title and category are the values of nested child elements of the <item> element. The (common) code shared by all these child-elements gets the character content of the respective elements and is shown below:
-defaultElement:children attributes:atrs parser:parser
{
return [[children combinedText] retain];
}
Unlike the <item> processing code, which did not return a value, this method does return a value. MAX uses this return value to build
a DOM-like structure which is then consumed by the next higher-level, in this case the -itemElement:attributes:parser: method shown above. Unlike a traditional DOM, the MAX tree structure is built out of domain-specific objects returned incrementally by the client.These two pieces of sample code demonstrate how MAX can act like both a DOM parser or a SAX parser, controlled simply by wether the processing methods return objects (DOM) or not (SAX). They also demonstrated both element-specific and generic processing.
In the iTunes Song parsing example, I was able to build a MAX parser using about half the code required for the NSXMLParser-based example, a ratio that I have also encountered in larger projects. What about performance? It is slightly better than MPWSAXParser, so also somewhat better than libxml2 and significantly better than NSXMLParser.
While ably demonstrating the performance problems of NSXMLParser, the sample code's solution of using libxml2 is really not a solution, due to the significant increase in code complexity. Objective-XML provides both a drop-in replacement for NSXMLParser with all the performance and latency benefits of the libxml2 solution, as well as a new API that is not just faster, but also much more straightforward than either NSXMLParser or libxml2.
Labels: Objective-C, performance, XML
Sean McGrath claims that Binary XML solves the wrong problem.
Yes and no: it doesn't help much with existing structures and parsing methods, but with the right methods, it can be extremely helpful!
Also: "...how weird is it that we have not moved on from the DOM and SAX in terms of "standard" APIs for XML processing?"
Labels: performance, XML