=== Top of the Swiki === Attachments ===

XML: Parsers

XML ParserImplementorMore InformationComments
CampSmalltalk-XmlParserCincom, Camp Smalltalk, ported to Squeak by Bijan ParsiaSqueak code, Camp SmalltalkThe extensions of the filenames do not make sense: remove the ".gz" and simply file in. (Fixed 10/7/200; BJP) Also note that this is not up to date with VW 5i.2. I have to talk to Roger (who back ports it to VW 3.0) and see what's new.
"cleanroom" XML FrameworkMichael RuegerI was not able to upload it here due to a Swiki-problem... but fortunately there is a mailing list archive: XML Framework - akI did not have a single problem to file it in and execute the examples ak
BhrXmlParserDavid R Harris, ported by Helge HorchBhrXmlParser"a partial XML parser with a SAX-like event-driven interface"

There was a Camp Smalltalk project to port the Cincom XML parse (which a lot of people like) to other versions of Smalltalk. I know they got it running on VisualAge, and I think there were people working on Squeak. The page describing the project is at http://wiki.cs.uiuc.edu/CampSmalltalk/XmlParser+XSL+and+DOM+Level+2 but it doesn't say much about the status. Contact the leaders. The Cincom XML parser was made open source. -Ralph Johnson

  1. There are two partial XML parsers for Squeak floating about, with partial DOMish support. There's also the indev XML parser and DOM support for VisualWorks, which folks have talked about porting. These may be of interest.
  2. There's DOMish support in Scamper (for HTML). Integration with Scamper would be very nice.
  3. Have you considered Groves rather than/in addition to DOM support? Groves seem more Smalltalky/Squeaky (to the degree I understand them). There was just a release of a Python Groves implementation that looks rather nice, for reference.



  1. I haven't found the Squeak parsers. I did look at the indelv parser, and I have my reasons for not working with it.
  2. Integration with Scamper, or, better-put, evolution of Scamper, is a goal when I come to a better understanding of XSL Transformations.
  3. Groves is a little too hard core for me. I'm not sure if you've dealt with SGML at all, but the reason why XML was created is because SGML is a headache with a lot of nuances. I think that the XML work that is being done here could be grown into a much larger SGML/Groves implementation if the desire were there. More likely, XML will satisfy everyone's needs. An area of consideration is DSSSL. The guy who was doing HTML support for Emacs was considering writing an Emacs DSSSL processor and rewriting his support in DSSSL.




Email me and I can send you the partial parsers :) (I'm going to peek at yours.)
Good. :)
Hard core? Really? I have this Groves advocate (Ken MacLeod , who wrote the Perl groves implementaion) who keeps say that groves is easy and DOM a major pain. From our discussions, it sounds like they'd be especially easy in Smalltalk. They also can (supposedly) pretty easily emulate DOM style. I'll confess that I have trouble understanding them :) I like the idea of grove plans, though. I'll investigate some more. (I looked at the indev parser and felt sick too, but I think that was partially because DOM just looks hideous to me ;).)
New! I was just thinking that talk of a validating parser, in general, is misleading. After all, XML is a meta-language. DTDs are like grammars (ok, they ARE grammars :)). Validation is syntax checking. Etc. So, I though, why not make it more or less explicit and use a T-Gen style interface/system to have a XML parser generator? Since we just got a port of T-Gen :) (Of course, I'll bet that there are some nasty DTDs that aren't too easy to generate parsers for given normal automatic parsing techniques :( Oh well.)



I would recommend that you seriously check out the XML work being done by the Python community. That work would need to be "Squeakified," but they have already worked out a number of key issues and it well might save time to stand on their broad shoulders.


OK. I can't currently read Python so this will take some time. The actual XML stuff does not seem hard to do. It has been designed (with the exception of the DOM) to be easy. Anything in another language will still have to be recoded, and I think that Expat is a good basis for now because it takes encoding into consideration.


  • John Tobler

    Some older work by others


    See work by InDelv http://www.indelv.com/

    see email
    Date: Wed, 22 Sep 1999 17:32:46 -0700
    From: Duane Maxwell
    Subject: Re: Markup Language (SGML/XML) Parsing/Processing?

    He has an example of a XML parser

    Could someone copy this email here, please?

    [23-Sep-99/hh] Itīs in the mailing list archive:
    (http://macos.tuwien.ac.at:9009/135405170.asHtml.full)

    [23-Sep-99/jet] That link didn't work for me. Try message # 6305 in the eGroups archive (http://www.egroups.com/group/squeak/6305.html).