Xojo Conferences
MBSOct2019CologneDE

Big unicode problem (Real Studio network user group Mailinglist archive)

Back to the thread list
Previous thread: Scrolling a Portion of a Canvas
Next thread: Peer to Peer Messaging


Re: recordset's recordcount   -   Jan Erik Moström <
  Big unicode problem   -   Christian Schmitz
   Re: Big unicode problem   -   Joseph J. Strout
    Re: Big unicode problem   -   Kevin Ballard
     Re: Big unicode problem   -   Joseph J. Strout
    Re: Big unicode problem   -   Christian Schmitz
     Re: Big unicode problem   -   Joseph J. Strout
      Re: Big unicode problem   -   Christian Schmitz
       Re: Big unicode problem   -   Mars Saxman
        Re: Big unicode problem   -   Christian Schmitz
       Re: Big unicode problem   -   Thomas Reed
        Re: Big unicode problem   -   Charles Yeomans
         Re: Big unicode problem   -   Joseph J. Strout
         Re: Big unicode problem   -   Steve Schacht
       Re: Big unicode problem   -   Joseph J. Strout
       Re: Big unicode problem   -   Joseph J. Strout
       Re: Big unicode problem   -   Thomas Reed
        Re: Big unicode problem   -   Jörg Pressel <
         Re: Big unicode problem   -   Christian Schmitz
          Re: Big unicode problem   -   Kevin Ballard
           Re: Big unicode problem   -   Christian Schmitz

Big unicode problem
Date: 28.08.02 00:07 (Wed, 28 Aug 2002 01:07:37 +0200)
From: Christian Schmitz
Hi,

I've a big problem with Realbasic. A string with UTF8 and a String with
Unicode chars is not concated well.

Just try the example attached. RB will fail.

I really suggest to RS that they check for cases:

if a.encoding°encoding then
return concat(a,b)
elseif a.encoding=Unicode and b.encoding=Unicode then
return concatunicode(a,b)
elseif a.encoding=Unicode then
return concatunicode(a,b.makeunicode)
elseif b.encoding=Unicode then
return concatunicode(a.makeunicode,b)
else
return concatunicode(a.makeunicode,b.makeunicode)
end if

Or something like this.

PS: Realbugs report # 87241.

Mfg
Christian

dim s,t,u as string
dim c as textconverter

c=gettextconverter(gettextencoding(0),gettextencoding(256))

s=" " // UTF8
tÀconvert("Hallo")

if len(t)Pand lenb(t) then // we have real unicode

if len(s)and lenb(s)then // UTF8

u=s+t

if len(u) then
// RB 4.5
msgBox "Error: unicode and UTF8 in one string treated as UTF8"
else
msgBox str(len(u))
end if
else
msgBox "s is no UTF8 string."
end if
else
// RB 4.0.2 and RB 3.5
msgBox "t is no unicode string."
end if

--

Re: Big unicode problem
Date: 28.08.02 04:00 (Tue, 27 Aug 2002 20:00:56 -0700)
From: Joseph J. Strout
At 1:07 AM +0200 8/28/02, Christian Schmitz wrote:

>I've a big problem with Realbasic. A string with UTF8 and a String with
>Unicode chars is not concated well.

That's right. You'll have to convert them to a common format. The
only concatenations that currently work correctly are when one
string's encoding is a subset of the other (e.g., an ASCII string and
a UTF-8 string).

Cheers,
- Joe

Re: Big unicode problem
Date: 28.08.02 04:37 (Tue, 27 Aug 2002 23:37:16 -0400)
From: Kevin Ballard
Ouch, really? How come that isn't publicly known? I assume that's one
thing that's going to be fixed quickly in v5?

On Tuesday, August 27, 2002, at 11:00 PM, Joseph J. Strout wrote:

> At 1:07 AM +0200 8/28/02, Christian Schmitz wrote:
>
>> I've a big problem with Realbasic. A string with UTF8 and a String
>> with
>> Unicode chars is not concated well.
>
> That's right. You'll have to convert them to a common format. The
> only concatenations that currently work correctly are when one
> string's encoding is a subset of the other (e.g., an ASCII string and
> a UTF-8 string).

Re: Big unicode problem
Date: 28.08.02 15:10 (Wed, 28 Aug 2002 07:10:27 -0700)
From: Joseph J. Strout
At 11:37 PM -0400 8/27/02, Kevin Ballard wrote:

>Ouch, really? How come that isn't publicly known?

I thought it was.

Cheers,
- Joe

Re: Big unicode problem
Date: 28.08.02 18:24 (Wed, 28 Aug 2002 19:24:30 +0200)
From: Christian Schmitz
> At 1:07 AM +0200 8/28/02, Christian Schmitz wrote:
>
> >I've a big problem with Realbasic. A string with UTF8 and a String with
> >Unicode chars is not concated well.
>
> That's right. You'll have to convert them to a common format. The only
> concatenations that currently work correctly are when one string's
> encoding is a subset of the other (e.g., an ASCII string and a UTF-8
> string).

Such a basic thing should be done by RB like it does conversion between
integer and double.

So is it possible to make a function to return UTF8 out of Unicode16?

Mfg
Christian

Re: Big unicode problem
Date: 28.08.02 18:27 (Wed, 28 Aug 2002 10:27:54 -0700)
From: Joseph J. Strout
At 7:24 PM +0200 8/28/02, Christian Schmitz wrote:

>So is it possible to make a function to return UTF8 out of Unicode16?

But of course! That's what text encoding converters are for...

Cheers,
- Joe

Re: Big unicode problem
Date: 28.08.02 18:35 (Wed, 28 Aug 2002 19:35:18 +0200)
From: Christian Schmitz
> At 7:24 PM +0200 8/28/02, Christian Schmitz wrote:
>
> >So is it possible to make a function to return UTF8 out of Unicode16?
>
> But of course! That's what text encoding converters are for...

But than use it in your concat function.
RB must be possible to concat two strings with encoding.

Mfg
Christian

Re: Big unicode problem
Date: 28.08.02 18:36 (Wed, 28 Aug 2002 10:36:25 -0700)
From: Mars Saxman
<email address removed> wrote:

> But than use it in your concat function.
> RB must be possible to concat two strings with encoding.

If our development resources were unlimited, I'm sure it would do so
already.

Mars Saxman
REAL Software

---
Subscribe to the digest:
<mailto:<email address removed>>
Unsubscribe:
<mailto:<email address removed>>

Re: Big unicode problem
Date: 28.08.02 21:27 (Wed, 28 Aug 2002 22:27:03 +0200)
From: Christian Schmitz
> <email address removed> wrote:
>
> > But than use it in your concat function.
> > RB must be possible to concat two strings with encoding.
>
> If our development resources were unlimited, I'm sure it would do so
> already.

Oh. But please remember this if you have some time.

Mfg
Christian

Re: Big unicode problem
Date: 28.08.02 19:36 (Wed, 28 Aug 2002 13:36:23 -0500)
From: Thomas Reed
> > >So is it possible to make a function to return UTF8 out of Unicode16?
>>
>> But of course! That's what text encoding converters are for...
>
>But than use it in your concat function.
>RB must be possible to concat two strings with encoding.

The problem is, how does RB know what encoding your two strings have?
I'm not sure if RB tracks this sort of thing internally or not, but
regardless you'll have trouble when reading text from a file or
something like that. What encoding is it? Letting RB guess isn't
really a good idea -- I've had UTF-8 text interpreted as Japanese
when I've done this.

I'm afraid that, until the entire industry agrees on one single
encoding, these issues are going to be something we're all going to
have to think about. TextConverters are going to become part of your
everyday life! Even saving a filename to a file and retrieving it
later requires a little extra thought in RB 4.5 (where filenames are
given in UTF-8).

Re: Big unicode problem
Date: 28.08.02 20:58 (Wed, 28 Aug 2002 15:58:58 -0400)
From: Charles Yeomans
Where is this TextUtilities module? Apparently, it was included with
one of the prerelease versions, but it is nowhere to be found on the RS
web site.

Charles Yeomans


---
Subscribe to the digest:
<mailto:<email address removed>>
Unsubscribe:
<mailto:<email address removed>>

Re: Big unicode problem
Date: 28.08.02 21:07 (Wed, 28 Aug 2002 13:07:09 -0700)
From: Joseph J. Strout
At 3:58 PM -0400 8/28/02, Charles Yeomans wrote:

>Where is this TextUtilities module? Apparently, it was included
>with one of the prerelease versions, but it is nowhere to be found
>on the RS web site.

It's also on the 4.5 CD. We're working to get it posted to the web
site somewhere, but I can't say when that will actually happen.
Meanwhile, if anybody needs it and can't find it, send me mail and
I'll send it to you.

Best,
- Joe

Re: Big unicode problem
Date: 28.08.02 21:17 (Wed, 28 Aug 2002 14:17:22 -0600)
From: Steve Schacht
On 8-28-2002 1:58 PM, Charles Yeomans wrote:

> Where is this TextUtilities module? Apparently, it was included with
> one of the prerelease versions, but it is nowhere to be found on the RS
> web site.

Take this guy's advice... (I did.)

> Try asking Dave Grogono for a copy.
>
> Charles Yeomans

;-)

But seriously, yes it was included with some of the 4.5 alphas, but I think
it (and everything else on the RB CD) should be made available for download
via FTP or the Web.

---
Steve Schacht
<email address removed>

---
Subscribe to the digest:
<mailto:<email address removed>>
Unsubscribe:
<mailto:<email address removed>>

Re: Big unicode problem
Date: 28.08.02 20:17 (Wed, 28 Aug 2002 12:17:04 -0700)
From: Joseph J. Strout
At 7:35 PM +0200 8/28/02, Christian Schmitz wrote:

> > >So is it possible to make a function to return UTF8 out of Unicode16?
>>
>> But of course! That's what text encoding converters are for...
>
>But than use it in your concat function.
>RB must be possible to concat two strings with encoding.

There was not time to add all possible and desirable Unicode support
functionality in 4.5. But we know what we need to do for 5.0.

Thanks,
- Joe

Re: Big unicode problem
Date: 28.08.02 20:28 (Wed, 28 Aug 2002 12:28:54 -0700)
From: Joseph J. Strout
At 1:36 PM -0500 8/28/02, Thomas Reed wrote:

>The problem is, how does RB know what encoding your two strings
>have? I'm not sure if RB tracks this sort of thing internally or
>not, but regardless you'll have trouble when reading text from a
>file or something like that. What encoding is it? Letting RB guess
>isn't really a good idea -- I've had UTF-8 text interpreted as
>Japanese when I've done this.

Right, but you can tell RB what encoding it is by passing it through
a TextConverter where the source and dest are the same (or by using
the AssertEncoding function in TextUtilities, which does exactly
that).

(And RB does keep track of the encoding internally; that's what it
means to tell RB what encoding a string is.)

Of course if you don't know what encoding it is either, then you're
still out of luck, as you say.

Cheers,
- Joe

Re: Big unicode problem
Date: 28.08.02 20:48 (Wed, 28 Aug 2002 14:48:10 -0500)
From: Thomas Reed
>Right, but you can tell RB what encoding it is by passing it through
>a TextConverter where the source and dest are the same (or by using
>the AssertEncoding function in TextUtilities, which does exactly
>that).

Right -- which means we're going to have to be thinking about these
issues, rather than just assuming strings are just strings and will
work fine no matter what we plan to do with them.

Re: Big unicode problem
Date: 28.08.02 20:53 (Wed, 28 Aug 2002 21:53:43 +0200)
From: Jörg Pressel <
Theo's StringStuff Plugin is really the answer to all of your problems.
Use its SetStringEncoding and GetStringEncoding to affect how RB interprets
your string... works great, I use it all the time.

Jörg

____________________________________________
three-2-one interaktive Medien GmbH

<email address removed>
http://www.three-2-one.com
fon: +49 2151 319450

---
Subscribe to the digest:
<mailto:<email address removed>>
Unsubscribe:
<mailto:<email address removed>>

Re: Big unicode problem
Date: 28.08.02 22:16 (Wed, 28 Aug 2002 23:16:27 +0200)
From: Christian Schmitz
> Theo's StringStuff Plugin is really the answer to all of your problems.
> Use its SetStringEncoding and GetStringEncoding to affect how RB interprets
> your string... works great, I use it all the time.

MBS plugin has the same functions...

Mfg
Christian

Re: Big unicode problem
Date: 28.08.02 23:17 (Wed, 28 Aug 2002 18:17:11 -0400)
From: Kevin Ballard
MBS costs money...

On Wednesday, August 28, 2002, at 05:16 PM, Christian Schmitz wrote:

>> Theo's StringStuff Plugin is really the answer to all of your
>> problems.
>> Use its SetStringEncoding and GetStringEncoding to affect how RB
>> interprets
>> your string... works great, I use it all the time.
>
> MBS plugin has the same functions...

Re: Big unicode problem
Date: 28.08.02 23:31 (Thu, 29 Aug 2002 00:31:52 +0200)
From: Christian Schmitz
> MBS costs money...

But there are quite some people here who have paid.
So I note it for them.

Mfg
Christian