Unicode and REALbasic -- a correction (Real Studio network user group Mailinglist archive)

Back to the thread list
Previous thread: Multiple Selection ListBox and the Double Click event
Next thread: problem with ESellerate plugin

Advice needed: secure communcation with mySQL   -   Jan Erik Moström <
  Unicode and REALbasic -- a correction   -   Thomas Reed

Unicode and REALbasic -- a correction
Date: 23.08.02 18:30 (Fri, 23 Aug 2002 12:30:09 -0500)
From: Thomas Reed
I've gotten a few private e-mails lately blaming RB for Unicode issues,
and I'm afraid my posts about various UTF8 problems have prompted this.
I want to clarify things, for the record.

I don't want anyone to believe the RB "needs to do something about" the
"problem" of Unicode. Unicode is the future of text in Mac OS X. It's
just something we're all going to have to deal with and learn more about
from now on. Yes, it sucks to have to think about these things for those
of us who have always thought of text as nothing more than a series of 1-
byte characters. And sure, RB 4.5 may have brought these issues to the
fore by giving us FolderItem names in UTF8. After all, it was hard to
have trouble with UTF8 in RB 4.0 unless you already were TRYING to deal
with UTF8 in the first place! But I'm afraid I've given the wrong
impression that RB is extremely buggy with regard to UTF8. (I admit I
thought so briefly, but worked through most of the "bugs" as simple
encoding issues.)

There DO seem to be some bugs involving UTF8. One that I can think of
was a bug in CreateAsFolder in RB 4.5 that should (according to RS) be
fixed in 4.5.1. Another appears to be a bug in OS X (which I'm hoping
will be fixed in 10.2, but I don't know if it will be or not). The bugs
I'd mentioned before with RegEx and ReplaceAll "corrupting" UTF8 text
aren't so much bugs as they are problems with the documentation not
saying they don't recognize UTF8 text yet.

Everything else is just the facts of life in OS X. For example, if you
plan on writing UTF8 text (such as a file name) to a file and you want to
read it back in later, you'd better be prepared to think about text
encodings. RB can't know whether a given string, read from a file, is
UTF8 or ASCII or something else unless you tell it. If you write UTF8
text to a file, you're going to need to use a TextConverter (or the
TextUtilities module's AssertEncoding method) to tell RB that it's UTF8
when you read it back in. This may seem like a bug to those who, like
me, have never had to think about the encoding of text before. But it isn't.

In summary, there are problems you'll encounter with UTF8 in RB 4.5,
since this is the first time we've been forced to deal with UTF8 in RB,
but almost all of them are NOT RB BUGS! In many cases, additional notes
in the documentation would suffice. Perhaps a chapter on dealing with
text conversions written for the average American who has never dealt
with any kind of text but ASCII would help.

Anyway, hope this clears things up.


Personal web page: http://home.earthlink.net/~thomasareed/
My shareware: http://home.earthlink.net/~thomasareed/shareware/
REALbasic page: http://home.earthlink.net/~thomasareed/realbasic/
Pixel Pen web pub. guide: http://home.earthlink.net/~thomasareed/pixelpen/

A conclusion is simply the place where you got tired of thinking.

Subscribe to the digest:
<mailto:<email address removed>>
<mailto:<email address removed>>