(opens in same window)
(opens in new window)
(likely 404 error)
July-September 2005 (oldest-to-newest)
So I'm sitting here at work listening to the Defender
2000 soundtrack. One of the tracks is a remix of
Beethoven's 5th. You know, the one in that level of
Dragon's Lair 2 where you're a mini-Dirk being chased by
a cat while carried through the air on a flying piano.
Erm. Anyway, I'm sitting there trying to imagine what
Beethoven would think if he heard his masterpiece played
to a decidedly hip-hop beat. Makes me smile :)
Okay so the secret's out.
I am in the process of porting two Yaroze games to NUON
using the library I created for Decaying Orbit. Why
go to the trouble of making the library and only port one
The first game is
Ben James' "Katapila". The second is Philippe-Andre Lorin's "Invs". Both gents have graciously offered
help as I take their respective source code and get the
games running on NUON.
Katapila is nearly done. Just a few details left to
include. Invs is ported, but not optimized. I need to
spend some time improving the frame rate. No guess on a
release date for either one yet.
I fixed the last known bug with my library's handling of
Invs. Now I need to optimize the hell out of it so that it
runs at 30fps throughout. I'm actually looking forward to
this as I find it fun.
One of the major time sinks is that, rather than
clearing the screen every frame, the background is copied
over from another buffer. *PLUS* the background buffer
is faded slightly every frame. Certain sprites are drawn
to the background buffer in addition to the main buffer
and they leave trails as it fades out. Very cool effect,
but the current brute force implementation is too
As it stands now the copy is done on MPE3 in C code
rather than on the renderers. This is bad because the
DMAs are serialized and blocking. The fade is applied
some time later by drawing a translucent box over the
background buffer. This part is done on the renderers,
but requires another set of read/modify/write. In fact
it does two reads because it needs to read both the
background buffer and the sprite image, which is just
a box of solid color.
I plan to speed this up tremendously by performing all
the operations in one pass on the renderers. I will read
in the background buffer, write it to the destination,
then fade the pixels and write it back to the background
buffer. Only one read and two writes are necessary, which
is the bare minimum.
If you remember (
"An intriguing idea"),
I modified the sprite library to use overlays for the
sprite renderers. Each sprite type can have its own
renderer that is loaded when needed. That way I can
have a virtually unlimited number of renderers without
taking up precious local RAM.
Anyway, I plan to do the same thing with the code that does background clearing. At the moment the clear routine
is fixed to clear to the given color. I will change that
to perform the steps described two paragraphs up. It will
use overlays as well so that any number of screen clearing
methods can be created as the need arises.
I'm excited to see what kind of speed boost this will
I timed the various portions of the
Invs main loop. The initial
frames of the first level take about 80ms. As suspected
the background handling occupies a tremendous amount of
that time. Both the copy and fade take over 20ms
each. To hit 30fps each frame needs to complete
in 33ms so you can see how optimizing this will help.
Another optimization is to allow the previous frame to
render while the current frame is computing. At the
moment they are done serially. That should buy another
I borrowed a friend's DVD set of Firefly. You can
guess where this is going. I've been hooked and feel the
need to complete my addiction post haste. At least there's
only 14 episodes so it won't take long.
I'm just starting to test out the new clear routine as
described earlier. At the moment it just copies the
background buffer to the current frame buffer. Once I get
that working I'll make the pixels fade out.
An unexecpted sick day yesterday let me finish watching
the rest of Firefly. What a great series. I hope the
movie does well enough to bring it back. Knowing how these
things typically go, however, I'm not holding my breath.
Oh yes, the background renderer is quite the speedup. It
went from 80ms per frame on the opening level to about 65ms.
Then I realized I was only using two of the three MPEs
for rendering. Adding the third dropped that to just under
That 50ms includes about 25ms of rendering time and 20ms
of main loop execution. Once I overlap those two the
smaller one should hide inside the bigger. So a sub-33ms
frame should be very doable. And that's without optimizing
the background renderer.
I'm doing things the hard (but accurate) way right now.
The original Invs code calls for
the background to be faded by 8 on R, G and B every frame.
Because NUON does everything in YCrCb (curses) I must first
convert each pixel to RGB, subtract 8 from each, saturate
to zero in case they go negative, and convert back to
YCrCb. It takes about 29 cycles per pixel, which is a
just atrocious amount given I have to do this for the
entire screen. At 320x240 and using 3 rendering
MPEs that's 742k cycles per MPE (29 * 320 * 240 / 3).
Given that there's only 1800k cycles to play with in a
given frame (54,000,000 / 30) that means the clear
renderer takes 41% of the allowed time. Not ideal.
I can optimize what I have now and get it down to 18
cycles pretty easily (25%). However, I'm thinking of
changing the fade to a multiplicative effect rather than
subtractive (fading by X% rather than X brightness
levels). That would let me use matrices to combine all
the above steps into one, which would be significantly
faster. I'm guessing on the order of 7 cycles per pixel
Let's see if I can get it at full speed using the
accurate method first.
Using the multiplicative method sped up rendering by 40%
(from 25ms to 15ms). Overlapping rendering with the main
code then hid that 15ms behind the 20ms of processing.
w00t as they say. That put the game at 30fps during
the opening seconds of the first wave. I played for a
while and it did dip down below 30fps when things got more
I have some ideas on speeding up drawing of points
and lines, which Invs uses
quite a bit. Hopefully that will help.
I hit a crash bug, but I think I'm just running out of
sprites. Time to bump up that limit me thinks.
The crash I saw earlier was a bug in the Yaroze library.
Not too difficult to find and fix in the end. Turns out
I was running out of sprites, but not because of
the hardcoded limit. In some cases the sprite library
was losing track of a sprite. Eventually it lost track
of so many that it couldn't render a screen.
I created a special sprite renderer to handle points.
Invs uses them a lot to represent
energy that floats down for you to pick up. Before, each
one was an individual sprite and went through the whole
sprite pipeline, including reading a source image of
I changed it so that most of the sprite pipeline is
skipped. What's the point of rotating a single pixel?
I also allowed twelve points to be packed together in a
single sprite structure. That leads to fewer sprites,
which is less memory overhead and DMAs. Now the screen
can be littered with them and the frame rate doesn't
take a hit.
I need to implement something similar for lines as
the game uses them a lot too. They work as is, but the
current method never has gotten them exactly right.
Due to a bug in my timing/profiling functions the time to
render a frame was more optimistic than reality. I am
getting some slowdown pretty early on. I went back and
played the original Yaroze version and noticed how much
faster it seemed. Turns out that was true.
On the other hand, I fixed a couple remaining bugs.
One dealt with the pause screen and how it copies the
current frame over to another buffer. The other wasn't
so much a bug as an unimplemented feature. The Yaroze
has the ability to draw lines that transition between two
different colors from end-to-end. Until I can write a
renderer to handle that properly I just make it draw a
normal line using the average of the two colors.
I got the line renderer going yesterday. I need to do
some profiling to find out how much it helps speed-wise.
At least now lines are drawn more accurately than the old
sprite-as-line approximation. I need to add another
renderer to handle gradiated lines, but that shouldn't
take long since I can just modify the one I have.
I fixed the code that times code segments and can see
where the game is slowing down. Some of the one-off cases
can be fixed by changing from double to quad buffering.
For the rest I need to find out exactly what is the major
time sink. It seems to be in the processing rather than
the rendering. That gives me hope that it can be
Gradated lines are in. Next I'm thinking about creating
a special renderer for non-rotating, non-scaling sprites.
They basically just copy pixels from the source image to
the frame buffer. Transparency and translucency make
things complicated, but I can start simple.
Invs doesn't use translucency
on the invaders anyway.
The big oops I discovered is that the original game
runs at 60fps, not 30 as I suspected. Well since the game
is PAL technically it's 50fps, but regardless it spells
trouble. I need to do some heavy-duty optimizing to have
a chance at hitting full speed.
The above sprite renderer will help a bit I think.
Especially if I can group sprites that use the same source
image together in the same sprite structure. Even better
would be if the sprite is small enough that I can load
the whole thing into local ram and blit it out multiple
times. The invaders would definitely benefit from
It seems that function calls are really expensive. Adding
~40 sprites to the display list was taking longer than
expected. I put my timing routines around a function call
and got about 7 to 8ms per frame (cumulative over all
I put the same routines just inside that function and
it drops to 3 to 4ms.
Next I tried unrolling everything I could, inlining
all the functions. I got the time down to around 2ms.
I tell you this shocked the hell out of me. I knew
there was some overhead involved. Heck I'm willing to
live with it if it means my code is cleaner and easier to
maintain. But 4x the time? Sorry, that's too much.
So now my goal is to inline as much of the sprite
library within my Yaroze library as possible. That means
I won't be able to use libsprite2.a any more. I need to
actually inline the sprite lib into my code so they get
compiled together in one function. Hopefully this will
remove the C code as the bottleneck and put the burden
back on the renderers.
I'm working hard to optimize things as much as possible.
One thing I added, that I should have done before, is have
the Yaroze sprite structure keep a link to the NUON image
data. Before, each sprite had to go searching for the
appropriate image every frame. I did that because in
Yaroze-land there's no guarantee that the sprite structure
will persist for any length of time; it could be
destroyed immediately after the call to insert it into the
In Invs, most sprites are
semi-permanent. I can take advantage of that by keeping
a link to the image data as described above. If a sprite
hasn't changed from frame-to-frame then it doesn't need
to go searching for the image data.
The wrinkle is when the library's automatic memory
recovery kicks in. If a game uses a lot of images then
the lib deallocates old, unused images to make room for
the new ones. In that case the image data might not be
there. I added a check to make sure that the image data
is still valid.
This helped, but I took things a step further. I can
flag certain image data as "permanent" - ones that I know
are needed quite often. This tells the lib to never
deallocate them. That way I can guarantee that the image
data is always present and valid and just return it
immediately. This offers a tremendous speed boost
when inserting sprites to the display list.
I'm starting to wonder if the optimizations I'm making
for Invs would help get Decaying
Orbit to run at 60fps. Something to investigate
Good progress as the current round of optimizations have
pushed the bottleneck on rendering rather than processing.
I'm going to create a special renderer that handles
non-rotated, non-scaled sprites more quickly. That should
make me processing-bound again, but hopefully further
optimization is not necessary. It's getting tougher to
find places to speed up the main loop.
It's been a while since my last update. There are a
couple reasons for that. And another as to why it might
be a while to the next.
First, work has been very busy. There's been a lot of
pressure to get a task done. That has sucked up much of
my free time lately. Thankfully it's pretty much wrapped
Second, my hard drive went kaput. Fortunately I heard
it making noise before this happened and had a replacement
on hand. So I really only lost maybe a day to that
annoyance. Still, I'm happy to now have 100GB instead of
60 and that the new drive is quiet as a mouse. It's just
a pain reinstalling everything again.
The reason for future productivity loss is that my wife
is in the early stages of labor at the moment. Our son
will be born either today or tomorrow most likely. I'll
be understandably involved in diapers, spit-ups, and
sleep loss again for a while.
I did get the renderer finished that I talked about
earlier. It's hard to tell how much it helps, but I
believe it does. Next I want to find a way to pin the
frame rate at 50fps since that's the speed of the
original Invs. Getting 50fps
on NTSC involves skipping every sixth frame. I need to
figure out a way to do that.
Also, I'm at the point where I'm sick of optimizing
to eke out another one or two fps. I'll probably get it
running at a decent speed for most of it, allow some
slowdown to happen on the busier screens, and just
release the thing.
This web page and all other pages on this site are
© 1999-2007 Scott Cartier