=== Top of the Swiki === Attachments ===

Multilingual Support - General String as Array

Date: Tue, 21 Sep 1999 20:15:29 -0400
From: "Andrew C. Greenberg"
Subject: Re: Unicode support

>I dont understand how an Array is useful as a general String. They are both Collections, and that is about it.

You wanted something that could maintain and manipulate a sequence of
randomly accessed, but generalized objects. An Array seems the
broadest non-abstract class in the hierarchy that does this.

What is a GeneralizedString other than an Array of objects? Perhaps it is that the collection is all of a generally homogenous class, say, of instances of a subclass of GeneralizedCharacter? Perhaps we will require ALL characters of the array to be instances of one particular class? What is it about the GeneralizedCharacter that distinguishes it from, say, Object, clearly the most general version? Perhaps it is:


  1. All instances share exactly the same protocol, which
    includes a particulaized minimum protocol?
  2. Perhaps there is a abstract mechanism for a property list
    (isUpper, isLower) or a conversion list (asUpper)?
  3. The class is flyweight, or perhaps that = is the same as ==?


>Or is this your point - what protocol differentiates Strings from Arrays?

Yes.

>Maybe the only responsibility of the general StringClass should be as a Convertor:

String>>
as: aCharacterSet
asUnicode
asAscii

etc.....

>The reason I say this is that a common protocol for all Strings is a tough task.

Why? If it is, then maybe we don't yet understand strings well
enough to explain why they should be generalized? You say that Array
isn't enough, but String is too much. Where, then, does this belong?

Or, is it not in fact the String that is the crux of the matter, but
the Character that must be generalized? Or is there some interaction
between a String and a probably lightweight (Perhaps Flyweight?)
class that defines our notion of a String?

Given the beautifully parameterized structure of the collection class
filtering enumerations, it seems to me that a powerful
GeneralizedString class can be built, so that it can leverage the
protocol of the underlying character objects. When expressions
become so common as to make it useful to have special-cased methods,
but not enough to justify putting it in the GeneralizedString class,
we have just identified a reason to extend the hierarchy to include a
new subclass or subclasses.

>As people point out, even asUppercase won't hold for many languages.
>You may end up with a restricted and fairly useless protocol.
>With the 'asUnicode' approach, the client determines which protocol will be used to communicate with Strings. So a String has many possible
>interfaces (protocols), and the client chooses an appropriate interface
>by which to manipulate the String.

Actually, I think this is precisely the point. Either you have a
general class or you don't (Or Array really is it). Let's start with
the most fundamental objects, and try to find a way to capture unique
features in particular encodings and protcols for the
GeneralizedCharacter, and ways to reach those features through the
GeneralizedString protocol.

We cannot at once say that it is important to create a generalized
string and that it is impossible. Let's either do it or drop it.