Xojo Conferences
MBSSep2018MunichDE
XDCMay2019MiamiUSA

another encoding headhache (Real Studio network user group Mailinglist archive)

Back to the thread list
Previous thread: application language
Next thread: Re: Getting maximum input speed from BinaryStream :-<


another encoding headhache   -   Giulio
  Re: another encoding headhache   -   Joseph J. Strout

another encoding headhache
Date: 18.08.05 19:09 (Thu, 18 Aug 2005 20:09:56 +0200)
From: Giulio
I have an application that uses FTPSuite classes, and I have a
strange behaviour when receiving a Dir list containing file names
with accented characters.

I must compare the file names I receive from the server with names
contained in variables and here comes the strange thing:

I parse the name list, and when the name contains accented characters
and I compare it with the same value contained on a variable, they
doesn't match!

both are utf8, testing using encoding(variablename).internetname
if I put the two values ( the parsed name and the name on the
variable ) in two different editfields, they look OK and identical

but: if I test the length of the variable containing the parsed
name, it is longer than the other ( an additional character for every
accented character ), and if i cycle msgboxing every character
contained in the variable, the accented characters result as their
non-accented equivalent followed by space ( or a not displayable char).

Don't know if I should post this on FTPSuite list, but the question is:

how can happen that an UTF8 variable in REALbasic is corrupted this
way but correctly displayed and there's a way to fix it with some
kind of conversion?

thank you
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: another encoding headhache
Date: 18.08.05 20:09 (Thu, 18 Aug 2005 13:09:47 -0600)
From: Joseph J. Strout
At 8:09 PM +0200 8/18/05, Giulio wrote:

>I parse the name list, and when the name contains accented
>characters and I compare it with the same value contained on a
>variable, they doesn't match!

Well, they're not the same value then. But they may display the same
way. Accented characters can be represented in two ways: composed or
decomposed. When composed, an accented letter is a single character.
If decomposed, it's two characters (the base letter and the accent
mark). This is the primary wart on the whole Unicode system, that
there are two ways to represent the same text.

>how can happen that an UTF8 variable in REALbasic is corrupted this
>way but correctly displayed and there's a way to fix it with some
>kind of conversion?

Nothing is corrupted; this is (regrettably) perfectly valid UTF-8
text, in either form. If you know that the text you're dealing with
can be represented in some other encoding, you can convert to that,
and you should find that both strings convert to the same thing. At
a guess, ISO-Latin-1 would be a good assumption for most FTP servers.

HTH,
- Joe