Xojo Developer Conference
25/27th April 2018 in Denver.
MBS Xojo Conference
6/7th September 2018 in Munich, Germany.

Re: Need a little more string detail (Real Studio Plugins Mailinglist archive)

Back to the thread list
Previous thread: Why is wrapping necessary?
Next thread: Re: New Plugins-SDK available: NOSTATICINIT


Re: Plugins website   -   Troy A. Dix
  Re: Need a little more string detail   -   Theodore H. Smith
    Need a little more string detail   -   Einhugur Software
     Re: Need a little more string detail   -   Joseph J. Strout
    Re: Need a little more string detail   -   Dimitri
    Re: Need a little more string detail   -   Theodore H. Smith

Re: Need a little more string detail
Date: 07.06.02 12:03 (Fri, 7 Jun 2002 13:03:56 +0200)
From: Theodore H. Smith
> Hello Joe / and others
>
> I have been thinking about all this and how I can optimize
> things with all
> this new info. I'm currently adding ATSUI support in the Grid rendering
> engine to properly support all this.
>
> But there are a few things that I need more understanding in:
>
> 1. You say existing Apps will not brake. So I'm guessing here that
> REALCString will always return a ASCII and to do that convert
> if needed. And myString->CString() will return the text in its
> native encoding. For example if its Unicode text in the String then
> myString->CString() would give you the Unicode text and the
> REALCString
> would give the ASCII text.
>
> Is this understanding correct ??
> (If not then how do I get the Unicode text)

myString->CString() will always return a char*.

You can specify that your plugin only takes UTF8, and process only
UTF8.

I think you should read up on UTF8. Its a great format. UTF8, UTF16
and UTF32 are just different ways of encoding the same data. Its a
bit like doing gzip, sit, or tar on the same XML file. Same data,
but just encoded differently. Its just a string of characters anyhow,
so once you get a character "code point" out of the encoding, you
can do with it what you like.

Heres a good page on UTF8, although I have a lot more:

http://www.sun.com/developers/gadc/technicalpublications/articles/utf8.html

"

UTF-8 is an important encoding because of the following reasons:

* ASCII compatible
* easily supported
* compact and efficient for most scripts
* easily processed, unlike other multibyte encodings

At the recent Unicode Conference in Hong Kong, one company said
that their move to Unicode was simplified by the adoption of
UTF-8. Instead of changing their products' code to support
16-bit or 32-bit wide Unicode characters, they chose UTF-8
instead. What was their reason? They said that their system had
lots of hard-coded comparisons to find specific ASCII characters
in text. Instead of modifying their code everywhere, they simply
changed their character encoding to UTF-8, which is compatible
with ASCII. In other words, single byte ASCII characters retain
their encoded value in UTF-8. For example, code that checks for
a '\' can continue checking for the byte value 0x5C instead of
changing the code to check for 0x005C. Modifying hundreds of
lines of text processing code scattered throughout thousands of
lines of miscellaneous code can be time consuming and error
prone. Sometimes selecting the UTF-8 encoding can provide the
easiest and most cost-effective way to get a basic level of
Unicode support in a legacy application.

"

UTF8 is also sortable using normal byte sorting code. Many byte oriented
code can work on UTF8. Not all, but a lot of it.

The reason I'm not using UTF8 is simply speed. :o) But for ease of use,
go for UTF8. Its got more properties that make it great to use.

> 2. How common are each types under OS X RB at this time ? (This is very
> important, so I can optimize the Grid rendering engine
> properly so it
> wont spend most of its time converting strings)
>
> (Only need a rough estimate)

Just say, you'll only take UTF8 if you want to do things easily.

Although you might want to use UTF16 or UTF32 instead. You'll have
to read up on the types and their advantages and disadvantages.

Telling us what your grid rendering engine does will help us.

For example, does it do string manipulation? Or just drawing?

If you want, I could add REALentry support to my String Stuff
plugin. Its got all the Unicode functions you'll ever need!
And more too :o)

But thats only if plugin developers find REALentry support for
my plugin necessary. If you consider this, then you might want
to download my plugin to see what functions it has.
www.elfdata.com/programmer/downloadindex.html

> 3. Does Win32 use Unicode (probably then only under NT based
> systems of
> course). If so then how common are the Unicode strings in
> current RB
> under NT based systems ?
> Always ?? (or on some strings only, or maybe only for Some
> languages ??)

Win32 definitely uses Unicode. In fact, Win95 does! I was reading a
discussion about users asking how to install Unicode support on their
Win32 machines, and the situation was that they already had it installed
(its always installed), but they just didn't know how to access it.

As for RB apps on Win32 using Unicode, I don't know. But if it doesn't,
it will do.

Unicode is really the way to go with international text.

Need a little more string detail
Date: 07.06.02 13:31 (Fri, 07 Jun 2002 12:31:32 +0000)
From: Einhugur Software
Hello Joe / and others

I have been thinking about all this and how I can optimize things with all
this new info. I'm currently adding ATSUI support in the Grid rendering
engine to properly support all this.

But there are a few things that I need more understanding in:

1. You say existing Apps will not brake. So I'm guessing here that
REALCString will always return a ASCII and to do that convert
if needed. And myString->CString() will return the text in its
native encoding. For example if its Unicode text in the String then
myString->CString() would give you the Unicode text and the REALCString
would give the ASCII text.

Is this understanding correct ??
(If not then how do I get the Unicode text)

2. How common are each types under OS X RB at this time ? (This is very
important, so I can optimize the Grid rendering engine properly so it
wont spend most of its time converting strings)

(Only need a rough estimate)

3. Does Win32 use Unicode (probably then only under NT based systems of
course). If so then how common are the Unicode strings in current RB
under NT based systems ?
Always ?? (or on some strings only, or maybe only for Some languages ??)

Thanks

--  
______________________________________________________________________
Björn Eiríksson <email address removed>
Einhugur Software <email address removed>
http://www.einhugur.com
______________________________________________________________________
Einhugur Software has sold its products in 39 countries world wide.
______________________________________________________________________
For support: <email address removed>
For bug reports: <email address removed>
To post on the maillist: <email address removed>




- - - - - - - - - -
For list commands, send "Help" in the body of a message to
<<email address removed>>
Unsubscribe:
<mailto:<email address removed>>

Re: Need a little more string detail
Date: 07.06.02 15:18 (Fri, 7 Jun 2002 07:18:15 -0700)
From: Joseph J. Strout
At 12:31 PM +0000 6/7/02, Einhugur Software wrote:

>I have been thinking about all this and how I can optimize things with all
>this new info. I'm currently adding ATSUI support in the Grid rendering
>engine to properly support all this.

Sounds great.

>But there are a few things that I need more understanding in:
>
>1. You say existing Apps will not brake. So I'm guessing here that
> REALCString will always return a ASCII and to do that convert
> if needed.

No. What I mean is, all that's new in 4.5 is that we now keep track
of the encoding when we can. In 4.0, a string might be in MacRoman
or MacJapanese or UCS-16 or UTF-8 or something else, depending on
where it came from, and you'd have no way of knowing. You'd have to
just take a guess and hope for the best.

What's new in 4.5 is that we now keep track of the encoding as much
as we can, and provide this information to you if you want it. You
can ignore this, and continue to take a guess and hope for the best
if you like. This will often work, just as it usually did in the
past. But now you have more options if you want to do a bit more
work.

The ONLY new behavior in the SDK with regard to strings is that you
can now get and set the encoding (which sets a number on the string
-- it doesn't do any conversion).

>2. How common are each types under OS X RB at this time ? (This is very
> important, so I can optimize the Grid rendering engine properly so it
> wont spend most of its time converting strings)

I'd guess that most strings are in ASCII, with some strings coming
through in the platform encoding (e.g. MacRoman or MacJapanese).
UTF-8 strings will appear when the user has gotten them from a file
name, and a few other minor places (such as the KeyDown event of an
EditField). Of course the user may get other things from other
places (e.g., may grab the 'utxt' data from the clipboard or some
such).

>3. Does Win32 use Unicode (probably then only under NT based systems of
> course). If so then how common are the Unicode strings in current RB
> under NT based systems ?
> Always ?? (or on some strings only, or maybe only for Some languages ??)

Win32 can use Unicode, but I don't know to what extent (if at all) RB
takes advantage of that yet.

HTH,
- Joe

Re: Need a little more string detail
Date: 07.06.02 19:58 (Fri, 07 Jun 2002 20:58:54 +0200)
From: Dimitri
At 13:03 6/7/2002 +0200, you wrote:

>>3. Does Win32 use Unicode (probably then only under NT based systems of
>> course). If so then how common are the Unicode strings in current RB
>> under NT based systems ?
>> Always ?? (or on some strings only, or maybe only for Some languages ??)
>
>Win32 definitely uses Unicode. In fact, Win95 does! I was reading a
>discussion about users asking how to install Unicode support on their
>Win32 machines, and the situation was that they already had it installed
>(its always installed), but they just didn't know how to access it.

? Perhaps you are confusing it with Multi Byte Character Set (MBCS) ?
9x/ME doesn't have Unicode, one needs to install Unicows to support it.
See http://www.microsoft.com/globaldev/articles/mslu_announce.asp

Regards,
Dimitri



- - - - - - - - - -
For list commands, send "Help" in the body of a message to
<<email address removed>>
Unsubscribe:
<mailto:<email address removed>>

Re: Need a little more string detail
Date: 08.06.02 12:10 (Sat, 8 Jun 2002 13:10:01 +0200)
From: Theodore H. Smith
>>> 3. Does Win32 use Unicode (probably then only under NT based
>>> systems of
>>> course). If so then how common are the Unicode strings in
>>> current RB
>>> under NT based systems ?
>>> Always ?? (or on some strings only, or maybe only for Some
>>> languages ??)
>>
>> Win32 definitely uses Unicode. In fact, Win95 does! I was reading a
>> discussion about users asking how to install Unicode support on their
>> Win32 machines, and the situation was that they already had it
>> installed
>> (its always installed), but they just didn't know how to access it.
>
> ? Perhaps you are confusing it with Multi Byte Character Set (MBCS) ?
> 9x/ME doesn't have Unicode, one needs to install Unicows to support it.
> See http://www.microsoft.com/globaldev/articles/mslu_announce.asp

I heard that NotePad understand UTF8 and UTF16 just fine. If
NotePad can understand it, it might be because it's using the OS
support for it. If you have Windows, try downloading a file from

http://www.cl.cam.ac.uk/~mgk25/ucs/examples/

and open it in NotePad.

Well I don't use Windows (thankfully) so I can't test this out for you!