Xojo Developer Conference
25/27th April 2018 in Denver.
MBS Xojo Conference
6/7th September 2018 in Munich, Germany.

SOT: Premature Optimization (Real Studio network user group Mailinglist archive)

Back to the thread list
Previous thread: Re: MBS Plugin - Quicktime Movie glitch/pause in playback under Windows
Next thread: Re: [ANN] CoreGraphics Module


Win32 API Declares   -   Berg, Heath
  SOT: Premature Optimization   -   Joseph
   Re: SOT: Premature Optimization   -   Phil M
   Re: SOT: Premature Optimization   -   Charles Yeomans
    RE: SOT: Premature Optimization   -   Joseph
    Re: SOT: Premature Optimization   -   joe strout.net
     Re: SOT: Premature Optimization   -   Joe Huber
     Re: SOT: Premature Optimization   -   Travis Hill
     Re: SOT: Premature Optimization   -   Mike Woodworth
      Re: SOT: Premature Optimization   -   Matthew Williamson
       RE: SOT: Premature Optimization   -   Paul Mathews
   Re: SOT: Premature Optimization   -   Theodore H. Smith
    Re: SOT: Premature Optimization   -   joe strout.net
   Re: SOT: Premature Optimization   -   Brendan Murphy
    Re: SOT: Premature Optimization   -   joe strout.net
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Brad Rhine
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Brad Rhine
   Re: SOT: Premature Optimization   -   Charles Yeomans
    RE: SOT: Premature Optimization   -   Walter Purvis
     Re: SOT: Premature Optimization   -   Charles Yeomans
     Re: SOT: Premature Optimization   -   Norman Palardy
   Re: SOT: Premature Optimization   -   Charles Yeomans
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Marcus Bointon
   Re: SOT: Premature Optimization   -   Ruslan Zasukhin
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Theodore H. Smith
   Re: SOT: Premature Optimization   -   Ruslan Zasukhin

SOT: Premature Optimization
Date: 01.08.06 14:45 (Tue, 1 Aug 2006 08:45:40 -0500)
From: Joseph
This was just too good not to share. (Slightly Off Topic)

http://www.acm.org/ubiquity/views/v7i24_fallacy.html

~joe

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 01.08.06 15:30 (Tue, 1 Aug 2006 10:30:08 -0400)
From: Phil M
On Aug 1, 2006, at 9:45 AM, Joseph wrote:

> This was just too good not to share. (Slightly Off Topic)
>
> http://www.acm.org/ubiquity/views/v7i24_fallacy.html

I liked the article.

I don't think that I have fallen into this viewpoint (I believe in
performance optimization) except in new topics that I am learning or
with optimization techniques that I am not aware of. The funniest
part of the article (to me) is the part where he wrote: "they were
taught in their college Data Structures and Algorithm Analysis
course, that each statement in a program takes one unit of time to
execute". In my class, this is exactly how the book was written.
And although the professor did mention that each operation has
different execution times, he did not go into detail and I don't
think that most of the students understood his point.

For REALbasic, I think that low-level optimization is almost out of
our hands (unless you write a plugin). However, I believe that it is
important to understand exactly how expensive are the functions that
you call, and the advantages/disadvantages.

For example, using the "B" functions (such as InStrB) when
appropriate can increase performance. The advantages with the "InStr
()" is that it will make case-insensitive searches and will convert
encodings (if necessary) during the search. All of these extra
features are convenient, but are not necessary if you are doing case-
sensitive searches and you know all of the strings are using the same
text encoding.

But the important point from the article is that there is *no* reason
why REALbasic programmers cannot be optimizing code as they write
it. It would be nice if someone more experienced than I wrote an
article about practical REALbasic optimization techniques.

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 01.08.06 15:48 (Tue, 1 Aug 2006 10:48:36 -0400)
From: Charles Yeomans

On Aug 1, 2006, at 9:45 AM, Joseph wrote:

> This was just too good not to share. (Slightly Off Topic)
>
> http://www.acm.org/ubiquity/views/v7i24_fallacy.html
>

While this article is not without some worth, I wouldn't trust it too
far. Hyde appears to be yet another of those who quotes sources he
has not read. The quotes about premature optimization he attributes
to Hoare are actually those of Knuth, from his paper "Structured
Programming with go to Statements".

I was also struck by his

Observation #5: Software engineers have been led to believe that they
are incapable of predicting where their applications spend most of
their execution time.

Perhaps they have been led to believe this by Knuth, who writes

"It is often a mistake to make a priori judgments about what parts of
a program are really critical, since the universal experience of
programmers who have been using measurement tools has been that their
guesses fail."

Knuth's article can be found on the web, and I suggest that anyone
wanting to learn about optimization and program design read it.

Charles Yeomans
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

RE: SOT: Premature Optimization
Date: 01.08.06 16:08 (Tue, 1 Aug 2006 10:08:51 -0500)
From: Joseph
In the first paragraph he notes that the quote was popularized by Knuth but
cites Hoare as the author.

"Every programmer with a few years' experience or education has heard the
phrase 'premature optimization is the root of all evil.' This famous quote
by Sir Tony Hoare (popularized by Donald Knuth) has become a best practice
among software engineers."

As for the apparent contradiction between Hyde and Knuth I think that the
complete quote makes it clear that if the code is obviously awful then it is
wise to clean it up. That's different from simply making some wild
speculative guesses about were a bottleneck would likely occur. Here is the
whole paragraph:

"Observation #5: Software engineers have been led to believe that they are
incapable of predicting where their applications spend most of their
execution time. Therefore, they don't bother improving performance of
sections of code that are obviously bad because they have no proof that the
bad section of code will hurt overall program performance."

This idea is further developed in the following paragraph:

"One thing nice about optimization is that if you optimize a section of code
that doesn't need it, you've not done much damage to the application. Other
than possible maintenance issues, all you've really lost is some time
optimizing code that doesn't need it. Though it might seem that you've lost
some valuable time unnecessarily optimizing code, don't forget that you have
gained valuable experience so your are less likely to make that same mistake
in a future project."

I'm not trying to defend the article just clarifying what I see as the
authors main point. That being that we have a lot of clunky applications on
the market that seem to be poorly optimized. I'll refrain from naming some
of my favorite clunky apps to avoid a tangential thread.

~joe




-----Original Message-----
From: <email address removed>
[mailto:<email address removed>] On Behalf Of Charles
Yeomans
Sent: Tuesday, August 01, 2006 9:49 AM
To: REALbasic NUG
Subject: Re: SOT: Premature Optimization

On Aug 1, 2006, at 9:45 AM, Joseph wrote:

> This was just too good not to share. (Slightly Off Topic)
>
> http://www.acm.org/ubiquity/views/v7i24_fallacy.html
>

While this article is not without some worth, I wouldn't trust it too
far. Hyde appears to be yet another of those who quotes sources he
has not read. The quotes about premature optimization he attributes
to Hoare are actually those of Knuth, from his paper "Structured
Programming with go to Statements".

I was also struck by his

Observation #5: Software engineers have been led to believe that they
are incapable of predicting where their applications spend most of
their execution time.

Perhaps they have been led to believe this by Knuth, who writes

"It is often a mistake to make a priori judgments about what parts of
a program are really critical, since the universal experience of
programmers who have been using measurement tools has been that their
guesses fail."

Knuth's article can be found on the web, and I suggest that anyone
wanting to learn about optimization and program design read it.

Charles Yeomans
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 01.08.06 18:26 (Tue, 1 Aug 2006 11:26:49 -0600)
From: joe strout.net
On Aug 01, 2006, at 14:48 UTC, Charles Yeomans wrote:

> I was also struck by his
>
> Observation #5: Software engineers have been led to believe that they
> are incapable of predicting where their applications spend most of
> their execution time.
>
> Perhaps they have been led to believe this by Knuth, who writes
>
> "It is often a mistake to make a priori judgments about what parts of
> a program are really critical, since the universal experience of
> programmers who have been using measurement tools has been that their
> guesses fail."

They may also have been led to this by years of experience, as I have. I don't write poor-performing code when I can just as easily and cleanly write efficient code, and I have a lot of experience doing optimizations -- yet a little effort with a profiler almost always turns up surprises in real applications. This is so common, in fact, that I'm greatly surprised on those rare occasions when the profiler has no surprises for me!

Best,
- Joe

Re: SOT: Premature Optimization
Date: 01.08.06 18:48 (Tue, 1 Aug 2006 10:48:21 -0700)
From: Joe Huber
>a little effort with a profiler almost always turns up surprises in
>real applications. This is so common, in fact, that I'm greatly
>surprised on those rare occasions when the profiler has no surprises
>for me!

I agree and think part of the reason is that it's not always clear
which parts of a framework or external API calls are expensive and
which are not.

We may be familiar with our code and have a good sense of where our
own bottlenecks are. But we're at the mercy of profilers to learn
where bottlencks are in external code and they often provide
surprising results.

Regards,
Joe Huber
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 01.08.06 18:58 (Tue, 1 Aug 2006 11:58:55 -0600)
From: Travis Hill

On Aug 1, 2006, at 11:48 AM, Joe Huber wrote:

> We may be familiar with our code and have a good sense of where our
> own bottlenecks are. But we're at the mercy of profilers to learn
> where bottlencks are in external code and they often provide
> surprising results.

This brings up a something I've been running into lately when trying
to profile- is there any way to use Shark with a PowerPC binary on an
Intel Mac? Every time I try, the symbol names are messed up... I'm
guessing it is just that Rosetta mangles them up somehow.

--Travis
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 01.08.06 19:33 (Tue, 1 Aug 2006 14:33:49 -0400)
From: Mike Woodworth
I'm seeing the same thing on a clients macbook... same is true with
crash reports. this is the only thing stopping me from buying a
shiny new mac book pro, so i hope RB solves this ASAP (or UB
compiling would be fine too, i guess :)

for the time being, i'm back to using Pulp for profiling on intel
(man, this product has legs - i bought it 4 years ago, hasnt been
updated since, but still works as well as the day i bought) its a
util for RB to embed timer based profiling code in every method, and
then writes out a text file that metrowerks profiler app
recognizes... simple elegant, effective.

mike

Re: SOT: Premature Optimization
Date: 01.08.06 19:57 (Tue, 01 Aug 2006 13:57:25 -0500)
From: Matthew Williamson
Haven't followed this thread closely, so maybe someone has already mentioned
this, but there are doctors who specialize in these sorts of problems. ;->

-Matt


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

RE: SOT: Premature Optimization
Date: 01.08.06 20:08 (Tue, 1 Aug 2006 12:08:04 -0700)
From: Paul Mathews
I've found that thinking of mundane things while designing code prevents
premature optimization....

Paul Mathews

> -----Original Message-----
> From: <email address removed>
> [mailto:<email address removed>]On Behalf Of
> Matthew Williamson
> Sent: Tuesday, August 01, 2006 11:57 AM
> To: REALbasic NUG
> Subject: Re: SOT: Premature Optimization
>
> Haven't followed this thread closely, so maybe someone has
> already mentioned
> this, but there are doctors who specialize in these sorts of problems. ;->
> -Matt
>
> _______________________________________________
> Unsubscribe or switch delivery mode:
> <http://www.realsoftware.com/support/listmanager/>
> Search the archives of this list here:
> <http://support.realsoftware.com/listarchives/lists.html>

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 02.08.06 12:05 (Wed, 2 Aug 2006 12:05:58 +0100)
From: Theodore H. Smith
> From: Joe Huber <<email address removed>>
> Date: Tue, 1 Aug 2006 10:48:21 -0700
>
>> a little effort with a profiler almost always turns up surprises in
>> real applications. This is so common, in fact, that I'm greatly
>> surprised on those rare occasions when the profiler has no surprises
>> for me!
>
> I agree and think part of the reason is that it's not always clear
> which parts of a framework or external API calls are expensive and
> which are not.

The thing I don't understand that no one mentioned, not even the
article, is that fast code is usually maintainable code and code
that's quicker to write.

That's because less code = faster code. less code = quicker to write
and maintain.

Usually, anyhow.

Re: SOT: Premature Optimization
Date: 02.08.06 15:25 (Wed, 2 Aug 2006 08:25:09 -0600)
From: joe strout.net
On Aug 02, 2006, at 11:05 UTC, Theodore H. Smith wrote:

> The thing I don't understand that no one mentioned, not even the
> article, is that fast code is usually maintainable code and code
> that's quicker to write.

That's not mentioned because, in general, it is patently untrue. Quite the opposite: speed-optimised code is usually longer, more complex, and far more obtuse than unoptimised code.

Code that's short, clean, and efficient doesn't count as optimised code; that's just good code. We all write that way whenever we can. But serious optimization will require much more complex tricks, like loop reordering for data locality, changing to a more complex (but efficient) algorithm, etc.

Re: SOT: Premature Optimization
Date: 02.08.06 19:03 (Wed, 2 Aug 2006 13:03:30 -0500)
From: Brendan Murphy
Joe wrote:
> On Aug 02, 2006, at 11:05 UTC, Theodore H. Smith wrote:
>> The thing I don't understand that no one mentioned, not even the
>> article, is that fast code is usually maintainable code and code
>> that's quicker to write.
>>
>> That's because less code = faster code. less code = quicker to
>> write and maintain.
>
> That's not mentioned because, in general, it is patently untrue.
> Quite the opposite: speed-optimised code is usually longer, more
> complex, and far more obtuse than unoptimised code.
>
> Code that's short, clean, and efficient doesn't count as optimised
> code; that's just good code. We all write that way whenever we
> can. But serious optimization will require much more complex
> tricks, like loop reordering for data locality, changing to a more
> complex (but efficient) algorithm, etc.

Like Theo's statement, your statement is just as incorrect in the
opposite direction. Your assumption that there exists a linear
relationship between code complexity and the state of optimization
is false. There is no explicit relationship between the complexity
of the code and its optimized state. If you were handed a piece of
code and told to optimize it, sometimes making the code simpler is
the optimal solution. Other times adding code that allows you to
skip steps is the optimal solution. Since cases exist for both
directions, then it shows that the two variables of complexity and
optimization are not related. If you take Theo's point of view,
you will miss set of optimizations that could be applied. If you
take your point of view, you will also miss a set of optimizations
that could be applied. Optimization is function of the forethought
put into the code and the complexity or simplicity of the code is
just the result of the forethought.

Though Theo's assertion is not correct in the context in which he
presents it, it does have some truth to it in a bizarre twisted
way. People who create maintainable code are far more likely to
write optimized code! People who are sloppy in writing their code
are more likely to not write optimized code since their code is
more difficult to manipulate.

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 02.08.06 20:12 (Wed, 2 Aug 2006 13:12:38 -0600)
From: joe strout.net
On Aug 02, 2006, at 18:03 UTC, Brendan Murphy wrote:

> > That's not mentioned because, in general, it is patently untrue.
> > Quite the opposite: speed-optimised code is usually longer, more
> > complex, and far more obtuse than unoptimised code.
> >
> > Code that's short, clean, and efficient doesn't count as optimised
> > code; that's just good code. We all write that way whenever we
> > can. But serious optimization will require much more complex
> > tricks, like loop reordering for data locality, changing to a more
> > complex (but efficient) algorithm, etc.
>
> Like Theo's statement, your statement is just as incorrect in the
> opposite direction. Your assumption that there exists a linear
> relationship between code complexity and the state of optimization
> is false.

I make no such assumption, and you've set up a straw man -- an incorrect statement which is not, in fact, what I was saying. See the key phrases "in general" and "usually" in my statements above.

I stand by my statement that highly optimised code is usually (and in general) longer and more complex than unoptimised (but well-written) code, both from my own experience and because I've closely followed the work of my spouse, who is a computer scientist who specializes in advanced optimization (and the very difficult problem of having a compiler do things like managing data locality for you).

But *of course* your strawman about a linear relationship between code complexity and optimization is false. Of course one could write complex code that performs poorly.

> People who create maintainable code are far more likely to
> write optimized code! People who are sloppy in writing their code
> are more likely to not write optimized code since their code is
> more difficult to manipulate.

Well, I'm sure that's true too. But when the people who create good maintainable code find a need to optimize it, usually maintainability is reduced. Example: a Mandlebrot program in RB was first written using a nice neat complex number class. Profiling and optimization led to eliminating that class and inlining all the calculations, resulting in a dramatic speedup but more complex, less general, and harder-to-maintain code.

Short, maintainable code, and optimization for speed, are orthogonal aspects -- but still incompatible, in most cases.

Best,
- Joe

Re: SOT: Premature Optimization
Date: 02.08.06 19:37 (Wed, 2 Aug 2006 19:37:11 +0100)
From: Theodore H. Smith
> From: <email address removed>
> Date: Wed, 2 Aug 2006 08:25:09 -0600
>
> On Aug 02, 2006, at 11:05 UTC, Theodore H. Smith wrote:
>
>> The thing I don't understand that no one mentioned, not even the
>> article, is that fast code is usually maintainable code and code
>> that's quicker to write.
>
> That's not mentioned because, in general, it is patently untrue.
> Quite the opposite: speed-optimised code is usually longer, more
> complex, and far more obtuse than unoptimised code.

Nope.

Bloat, aka features, is well known for slowing down stuff. And it is
more complex to make.

For example, why write a regex parser class, when all you really
wanted was a character set searcher?

The character set searcher is simpler, and faster. And it can be used
for 80% of the cases that you might use a regex.

Why make your code do all sorts of awkward tricks with encodings,
(including but not limited to auto-convert on append), when you can
just assume all your data is UTF-8? Once again you get simpler and
faster code.

> Code that's short, clean, and efficient doesn't count as optimised
> code; that's just good code. We all write that way whenever we can.

Apparantly not :(

"all"? That's a wild statement, and it's not even "most".

I suppose you've never heard of copy/paste coders. People who copy/
paste stuff instead of refactoring.

> But serious optimization will require much more complex tricks,
> like loop reordering for data locality, changing to a more complex
> (but efficient) algorithm, etc.

Those can help.

Re: SOT: Premature Optimization
Date: 02.08.06 19:40 (Wed, 2 Aug 2006 14:40:43 -0400)
From: Brad Rhine
On Aug 2, 2006, at 2:37 PM, Theodore H. Smith wrote:

> Why make your code do all sorts of awkward tricks with encodings,
> (including but not limited to auto-convert on append), when you can
> just assume all your data is UTF-8?

Because assumptions are dangerous. ;)

Re: SOT: Premature Optimization
Date: 03.08.06 00:11 (Thu, 3 Aug 2006 00:11:52 +0100)
From: Theodore H. Smith
> From: Brendan Murphy <<email address removed>>
> Date: Wed, 2 Aug 2006 13:03:30 -0500
>
> Joe wrote:
>> On Aug 02, 2006, at 11:05 UTC, Theodore H. Smith wrote:
>>> The thing I don't understand that no one mentioned, not even the
>>> article, is that fast code is usually maintainable code and code
>>> that's quicker to write.
>>>
>>> That's because less code = faster code. less code = quicker to
>>> write and maintain.
>
> Though Theo's assertion is not correct in the context in which he
> presents it,

Actually, it is.

Because I said "usually". Not "always".

Here's a great example. Some project I've been working on recently
does a lot of this kind of thing:

if (ItemListArray[ItemListArray[GetSelectedIndex()].Parent].Name =t
"Fred") {
ItemListArray[ItemListArray[GetSelectedIndex()].Parent].Type = Maybe;
ItemListArray[ItemListArray[GetSelectedIndex()].Parent].Class = This;
ItemListArray[ItemListArray[GetSelectedIndex()].Parent].Thing = That;
}

etc etc etc.

It's not my code, which is why it's written like that.

Now, I would do it like this:

int i = GetSelectedIndex();
Item* item = &ItemListArray[i];
Item* parent = &ItemListArray[item->Parent];
if (item->Name ==Fred") {
item->Type = Maybe;
item->Class = This;
item->Thing = That;
}

As you can see: Less code = faster code.

Basically what I'm talking about is invariant code. Most coders don't
realise just how much invariant code there is in their code. I have a
good eye for that sort of thing :)

Re: SOT: Premature Optimization
Date: 03.08.06 00:19 (Thu, 3 Aug 2006 00:19:45 +0100)
From: Theodore H. Smith

> Well, I'm sure that's true too. But when the people who create
> good maintainable code find a need to optimize it, usually
> maintainability is reduced. Example: a Mandlebrot program in RB
> was first written using a nice neat complex number class.
> Profiling and optimization led to eliminating that class and
> inlining all the calculations, resulting in a dramatic speedup but
> more complex, less general, and harder-to-maintain code.

I remember working on that Mandelbrot and speeding it up.

I remember eliminating a lot of code, by storing values instead of
repeating calculations. So I reduced the code size and sped it up :)

You say that "we all write simple code at first", but even my
experience of RS projects shows this not to be true. You do write a
lot of invariant code, code which is basically doing something
already done on a previous line. By storing intermediates you'd get
faster speeds.

I don't deny that a nice neat complex number class looks simpler.
It's just that I wouldn't do it in RB due to object management
overhead. A decision so instinctive I normally wouldn't even consider
it necessary to explain or justify, I'd just do it. I just have a
feel for which things are fast and slow, usually anyhow.

In fact, I'd even be very wary of doing it with classes in C++ even
if every method was inlined and we didn't have memory allocation/
deallocation overhead (stack based allocation). You can't trust the
compiler to optimise things for you properly. Well, unless you know
your compiler well.

If you've seen some good tests of your favourite c++ compiler and it
proves to handle things like complex numbers as a class, just as fast
as doing it with two floats, then it's OK to use it as a class.

Re: SOT: Premature Optimization
Date: 03.08.06 00:28 (Thu, 3 Aug 2006 00:28:33 +0100)
From: Theodore H. Smith
> From: Brad Rhine <<email address removed>>
> Date: Wed, 2 Aug 2006 14:40:43 -0400
>
> On Aug 2, 2006, at 2:37 PM, Theodore H. Smith wrote:
>
>> Why make your code do all sorts of awkward tricks with encodings,
>> (including but not limited to auto-convert on append), when you can
>> just assume all your data is UTF-8?
>
> Because assumptions are dangerous. ;)

What if it's a guideline and not an assumption? Something like "use
utf-8 for most data processing, and utf-16 simply for input/output"?

Part of the speed increase my FastString class gets over Charles's
class based approach is that I don't do anything with encodings. Why
should I? I've never had a problem with it and no users have reported
it to me.

By eliminating a case which might occur less than 1% of the time, I
can get maybe 30% extra speed. And even that 1% of the time only
proves to be a design error on the developer's part, because he'd get
faster speed by using UTF-8 all throughout his app.

If there's anything I've learnt about string processing, it's that
it's really best to use one model for your data. Whether that's C++
or RB or anything.

In C++ we have so many string classes, CString (via MFC), stl's
string, char*, and then most libraries tend to have their own string
class, like CFString, or NSString. Then you need to write an app
using libraries, some which use char*, others using string, others
using NSString... it becomes a mess, complex, and slow, to do all the
interconversion.

Far quicker to just use one model, where possible.

Just the same for encodings. UTF-8 does everything so there's no
advantage in using anything other than UTF-8 except for input and
output.

It should be considered a design error to be processing strings in
more than one encoding, except to convert it to and from the dominant
encoding.

Well, I think you can assume that people should stick to your
suggested design principles.

Re: SOT: Premature Optimization
Date: 03.08.06 02:51 (Wed, 2 Aug 2006 21:51:28 -0400)
From: Brad Rhine
On Aug 2, 2006, at 7:28 PM, Theodore H. Smith wrote:

>>> Why make your code do all sorts of awkward tricks with encodings,
>>> (including but not limited to auto-convert on append), when you can
>>> just assume all your data is UTF-8?
>>
>> Because assumptions are dangerous. ;)
>
> What if it's a guideline and not an assumption?

Well, then, it's not an assumption anymore, is it? Sounds more like a
spec. Even so...

> Well, I think you can assume that people should stick to your
> suggested design principles.

"Should" and "will" are very different beasts.

Re: SOT: Premature Optimization
Date: 03.08.06 04:53 (Wed, 2 Aug 2006 23:53:26 -0400)
From: Charles Yeomans

On Aug 2, 2006, at 7:05 AM, Theodore H. Smith wrote:

>> From: Joe Huber <<email address removed>>
>> Date: Tue, 1 Aug 2006 10:48:21 -0700
>>
>>> a little effort with a profiler almost always turns up surprises in
>>> real applications. This is so common, in fact, that I'm greatly
>>> surprised on those rare occasions when the profiler has no surprises
>>> for me!
>>
>> I agree and think part of the reason is that it's not always clear
>> which parts of a framework or external API calls are expensive and
>> which are not.
>
> The thing I don't understand that no one mentioned, not even the
> article, is that fast code is usually maintainable code and code
> that's quicker to write.
>
> That's because less code = faster code. less code = quicker to
> write and maintain.
>
> Usually, anyhow.

Not really. Loop unrolling is a standard example of a speed
optimization that increases code size and reduces maintainability and
reliability in exchange for execution speed (sometimes). Quicksort
trades a more complex algorithm, vastly harder maintainability, and
more code for significantly improved execution.

Charles Yeomans
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

RE: SOT: Premature Optimization
Date: 03.08.06 05:05 (Thu, 3 Aug 2006 00:05:40 -0400)
From: Walter Purvis
Charles,

I agree with your overall point, but I'm curious about the quicksort
comment. Seeing as how the entire quicksort algorithm fits easily on a
single page, how much "vastly harder" can its maintainability be? Just
curious what you meant...

> -----Original Message-----
> Not really. Loop unrolling is a standard example of a speed
> optimization that increases code size and reduces
> maintainability and reliability in exchange for execution
> speed (sometimes). Quicksort trades a more complex
> algorithm, vastly harder maintainability, and more code for
> significantly improved execution.

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 03.08.06 05:46 (Thu, 3 Aug 2006 00:46:11 -0400)
From: Charles Yeomans
The idea of quicksort is extremely simple. But the direct
implementations of it are toy sorts, not suitable for real use. It's
quite tricky to do a real implementation, and apparently minor
changes can kill performance. Once you get it working, then of
course there isn't so much to maintain, so perhaps it's more accurate
to say that it's hard to get working. You can see the difference in
my SortLibrary code; compare the insertion sort code to the quicksort
code.

Charles Yeomans

On Aug 3, 2006, at 12:05 AM, Walter Purvis wrote:

> Charles,
>
> I agree with your overall point, but I'm curious about the quicksort
> comment. Seeing as how the entire quicksort algorithm fits easily on a
> single page, how much "vastly harder" can its maintainability be? Just
> curious what you meant...
>
>> -----Original Message-----
>> Not really. Loop unrolling is a standard example of a speed
>> optimization that increases code size and reduces
>> maintainability and reliability in exchange for execution
>> speed (sometimes). Quicksort trades a more complex
>> algorithm, vastly harder maintainability, and more code for
>> significantly improved execution.
>
> _______________________________________________
> Unsubscribe or switch delivery mode:
> <http://www.realsoftware.com/support/listmanager/>
> Search the archives of this list here:
> <http://support.realsoftware.com/listarchives/lists.html>

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 03.08.06 07:03 (Thu, 3 Aug 2006 00:03:10 -0600)
From: Norman Palardy

On Aug 02, 2006, at 10:46 PM, Charles Yeomans wrote:

> The idea of quicksort is extremely simple. But the direct
> implementations of it are toy sorts, not suitable for real use.
> It's quite tricky to do a real implementation, and apparently minor
> changes can kill performance. Once you get it working, then of
> course there isn't so much to maintain, so perhaps it's more
> accurate to say that it's hard to get working. You can see the
> difference in my SortLibrary code; compare the insertion sort code
> to the quicksort code.
>
> Charles Yeomans

And some of the optimizations that you can do to make quicksort run
quicker do add to overall algorithm complexity.
One that comes to mind is not using quicksort to sort sublists
smaller than a certain size.
Or median of 3 partitioning.
Neither are part of the original quicksort algorithm but are
additions that improve performance but add to overall complexity.

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 03.08.06 05:02 (Thu, 3 Aug 2006 00:02:45 -0400)
From: Charles Yeomans

On Aug 2, 2006, at 7:28 PM, Theodore H. Smith wrote:

>> From: Brad Rhine <<email address removed>>
>> Date: Wed, 2 Aug 2006 14:40:43 -0400
>>
>> On Aug 2, 2006, at 2:37 PM, Theodore H. Smith wrote:
>>
>>> Why make your code do all sorts of awkward tricks with encodings,
>>> (including but not limited to auto-convert on append), when you can
>>> just assume all your data is UTF-8?
>>
>> Because assumptions are dangerous. ;)
>
> What if it's a guideline and not an assumption? Something like "use
> utf-8 for most data processing, and utf-16 simply for input/output"?
>
> Part of the speed increase my FastString class gets over Charles's
> class based approach is that I don't do anything with encodings.
> Why should I? I've never had a problem with it and no users have
> reported it to me.

And you're writing in C, while I'm writing mostly in REALbasic. Now I
find it fastest to use Split and Join.

>
> By eliminating a case which might occur less than 1% of the time, I
> can get maybe 30% extra speed. And even that 1% of the time only
> proves to be a design error on the developer's part, because he'd
> get faster speed by using UTF-8 all throughout his app.
>
> If there's anything I've learnt about string processing, it's that
> it's really best to use one model for your data. Whether that's C++
> or RB or anything.
>
> In C++ we have so many string classes, CString (via MFC), stl's
> string, char*, and then most libraries tend to have their own
> string class, like CFString, or NSString. Then you need to write an
> app using libraries, some which use char*, others using string,
> others using NSString... it becomes a mess, complex, and slow, to
> do all the interconversion.
>
> Far quicker to just use one model, where possible.

Sure, but RS has chosen to opt for convenience, and it works pretty
well for most situations.

>
> Just the same for encodings. UTF-8 does everything so there's no
> advantage in using anything other than UTF-8 except for input and
> output.
>
> It should be considered a design error to be processing strings in
> more than one encoding, except to convert it to and from the
> dominant encoding.
>
> Well, I think you can assume that people should stick to your
> suggested design principles.

Probably Apple has some good developers, and they think UTF-16 is the
better choice.

Charles Yeomans


_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Re: SOT: Premature Optimization
Date: 03.08.06 11:53 (Thu, 3 Aug 2006 11:53:49 +0100)
From: Theodore H. Smith
> From: Charles Yeomans <<email address removed>>
> Date: Wed, 2 Aug 2006 23:53:26 -0400
>
> On Aug 2, 2006, at 7:05 AM, Theodore H. Smith wrote:
>
>>> From: Joe Huber <<email address removed>>
>>> Date: Tue, 1 Aug 2006 10:48:21 -0700
>>>
>>>> a little effort with a profiler almost always turns up surprises in
>>>> real applications. This is so common, in fact, that I'm greatly
>>>> surprised on those rare occasions when the profiler has no
>>>> surprises
>>>> for me!
>>>
>>> I agree and think part of the reason is that it's not always clear
>>> which parts of a framework or external API calls are expensive and
>>> which are not.
>>
>> The thing I don't understand that no one mentioned, not even the
>> article, is that fast code is usually maintainable code and code
>> that's quicker to write.
>>
>> That's because less code = faster code. less code = quicker to
>> write and maintain.
>>
>> Usually, anyhow.
>
> Not really. Loop unrolling is a standard example of a speed
> optimization that increases code size and reduces maintainability and
> reliability in exchange for execution speed (sometimes). Quicksort
> trades a more complex algorithm, vastly harder maintainability, and
> more code for significantly improved execution.

Yeah, this is true. Which is why I said "usually". I'm sure even a
quicksort can be sped up by eliminating repeated code.

As for loop unrolling, thats a perfect candidate for more code being
faster. That sort of thing is why I didn't say "always".

The thing about loop unrolling, is that it is the kind of
optimisation that can be applied in so many cases, only 1% of which
will really benefit. It's only good for low level stuff, where the
loop overhead outweighs the size of the code in the loop. In RB I
doubt it'll help much. In C, I know for sure that most of my loops
don't need it, only a few need it.

Unfortunately, there is no way to tell my compiler which loops to
unroll.

Re: SOT: Premature Optimization
Date: 03.08.06 12:03 (Thu, 3 Aug 2006 12:03:55 +0100)
From: Theodore H. Smith
> From: Charles Yeomans <<email address removed>>
> Date: Thu, 3 Aug 2006 00:02:45 -0400

>> Just the same for encodings. UTF-8 does everything so there's no
>> advantage in using anything other than UTF-8 except for input and
>> output.
>>
>> It should be considered a design error to be processing strings in
>> more than one encoding, except to convert it to and from the
>> dominant encoding.
>>
>> Well, I think you can assume that people should stick to your
>> suggested design principles.
>
> Probably Apple has some good developers, and they think UTF-16 is the
> better choice.

I think NSString was made before they were aware of UTF-8 and
definitely before UTF-8 became popular. UTF-16's inclusion could just
be a historical quirk. After all, it's a variable width encoding that
takes up far more space for most purposes. UTF-16 takes less space
for pure CJK, however.

But a Japanese HTML page will take up less space in UTF-8, due to all
the English tags and the spaces and returns.

UTF-16 also had some historical quirk, that it was originally
proposed that it would be a fixed size encoding. Then it later turned
out that it wouldn't be. While people still believed it was fixed
size, I think it was more attractive than it is now. I've seen this
refered to as an unintentional "Switch and bait" tactic. People were
opposed to variable length characters, so we gave them UTF-16. Then
it turned out that UTF-16 was variable length anyhow, therefor you
may as well use the UTF-8 that they originally proposed.

Also, Apple return their strings via an [NSString UTF8String]; method
and they have a [NSString stringWithUTF8String]; method too. These
are really handy for processing strings.

Re: SOT: Premature Optimization
Date: 03.08.06 13:17 (Thu, 3 Aug 2006 13:17:42 +0100)
From: Marcus Bointon
I may be wrong here, but I'm fairly sure that the dominant unicode
library (IBM's ICU) is centred around UTF-16. That sounds like a good
reason for using it. Generally I've got the impression that UTF-8 is
much better for web use as it's more space-efficient, but it's also
apparently slower to process than UTF-16, which would explain the
choice in a library.

I know that Valentina went UTF-16 for precisely this reason.

Marcus

Re: SOT: Premature Optimization
Date: 03.08.06 16:50 (Thu, 03 Aug 2006 18:50:20 +0300)
From: Ruslan Zasukhin
> From: Marcus Bointon <<email address removed>>
> Date: Thu, 3 Aug 2006 13:17:42 +0100
>
> I may be wrong here, but I'm fairly sure that the dominant unicode
> library (IBM's ICU) is centred around UTF-16. That sounds like a good
> reason for using it. Generally I've got the impression that UTF-8 is
> much better for web use as it's more space-efficient, but it's also

Correction, Marcus.

UTF8 is space-efficient only for languages of ROMAN group.

If you try store Cyrillic-win or Cyrillic-mac that use 1 byte per Russian
char, into UTF8 you start eat 2 bytes.

For Japan language one char that use 2 bytes in UTF16,
will eat 4 bytes in UTf8.

So UTF8 is good only for small set of languages.

> apparently slower to process than UTF-16, which would explain the
> choice in a library.
>
> I know that Valentina went UTF-16 for precisely this reason.

Re: SOT: Premature Optimization
Date: 03.08.06 17:11 (Thu, 3 Aug 2006 17:11:14 +0100)
From: Theodore H. Smith
> From: Marcus Bointon <<email address removed>>
> Date: Thu, 3 Aug 2006 13:17:42 +0100
>
> I may be wrong here, but I'm fairly sure that the dominant unicode
> library (IBM's ICU) is centred around UTF-16.

ElfData isn't too bad at doing Unicode stuff. Maybe nowhere near as
rich, but it does a lot of stuff still. It even does NFD and NFC.

Also, I do NFD and NFC on UTF-8, directly.

I've been told over and over that this isn't possible.

I know before I wrote this code, that it is possible, and also it
will be fast and simple to implement (for me).

They told me it wasn't possible, still.

I went and built it, and showed them.

Then they shut up :)

> That sounds like a good
> reason for using it. Generally I've got the impression that UTF-8 is
> much better for web use as it's more space-efficient, but it's also
> apparently slower to process than UTF-16, which would explain the
> choice in a library.

Not necessarily. I haven't seen any evidence that processing it is
slower, and I know that because of it's compactness it could even be
quicker. The fact that we don't have to interconvert to UTF-8 also
speeds things up.

UTF-16 has endian issues too, which UTF-8 does not.

And it's very reliable to detect if text is valid UTF-8 even without
a BOM. I have such a detection function in my ElfData plugin. You
can't reliably detect if text is UTF-16, without a BOM, unfortunately.

> I know that Valentina went UTF-16 for precisely this reason.

Could be a mistake :( Is he processing the full code points? If he
is, then the variable widthness of UTF-16 kills off the advantage
over utf-8.

RB's regex requires UTF-8, btw. If UTF-16 is so much easier, then why
is it using UTF-8?

Re: SOT: Premature Optimization
Date: 03.08.06 18:23 (Thu, 3 Aug 2006 18:23:00 +0100)
From: Theodore H. Smith
> From: Ruslan Zasukhin <<email address removed>>
> Date: Thu, 03 Aug 2006 18:50:20 +0300
>
>> From: Marcus Bointon <<email address removed>>
>> Date: Thu, 3 Aug 2006 13:17:42 +0100
>>
>> I may be wrong here, but I'm fairly sure that the dominant unicode
>> library (IBM's ICU) is centred around UTF-16. That sounds like a good
>> reason for using it. Generally I've got the impression that UTF-8 is
>> much better for web use as it's more space-efficient, but it's also
>
> Correction, Marcus.
>
> UTF8 is space-efficient only for languages of ROMAN group.
>
> If you try store Cyrillic-win or Cyrillic-mac that use 1 byte per
> Russian
> char, into UTF8 you start eat 2 bytes.
>
> For Japan language one char that use 2 bytes in UTF16,
> will eat 4 bytes in UTf8.

Three bytes, usually, actually, for CJK. It's still more than 2,
though, so UTF-8 can be less good for CJK than UTF-16.

Also, what about spaces? Those take up 1 byte instead of 2, even in
Japanese. When you start talking about HTML, then the space savings
really add up.

Also, if you really need space savings on UTF-8, you can look at
BOCU. http://www.unicode.org/notes/tn6/tn6-1.html The algorithm is
very simple. Maybe useful for stuff like databases, perhaps.

Also, if you are worried about space savings, it might be an idea to
put your text into NFC. My ElfData plugin has some NFC code. Some CJK
characters can take 3 code points instead of just one... Those are
the "conjoining jamo" letters.

However, I don't have any native experience of oriental languages, so
I may be missing something big. Like perhaps these letters are quite
rare...

> So UTF8 is good only for small set of languages.

Good for all languages in my opinion :)

Re: SOT: Premature Optimization
Date: 03.08.06 18:30 (Thu, 03 Aug 2006 20:30:23 +0300)
From: Ruslan Zasukhin
On 8/3/06 8:00 PM, "<email address removed>"
<<email address removed>> wrote:

>> I know that Valentina went UTF-16 for precisely this reason.
>
> Could be a mistake :( Is he processing the full code points? If he
> is, then the variable widthness of UTF-16 kills off the advantage
> over utf-8.
>
> RB's regex requires UTF-8, btw. If UTF-16 is so much easier, then why
> is it using UTF-8?

I think I know answer why.

If you look around that some software project which was not unicode - safe
once come to point when they need get unicode.

Old C/C++ software projects are based on char*
UTF8 fit this, but UTF16 not.

When we have switch Valentina to UTF16, we was lucky that we have decide in
the same time re-write the whole engine fro scratch using many new modern
C++ techniques.

And we have switch our code to UChar* which is for many compilers is
wchar_t, 2 bytes.

Many big old projects just cannot allow self re-write totally using new
string points and methods.

-----------
About REGEX...I know only one REGEX library that work with UTF16,
it is ICU library...

Apple use ICU, but they have open access to REGEX of ICU only in 10.4.

So REALbasic probably use some other third party REGEX which can work only
with UTF8.