Well, basically, it's because, if you're reading this document, you're probably not the kind of coder who is content to just slap something together through the nearest available API and leave it at that. You believe that that approach is best left to the shovelware merchants. You've got some neat voxel routine, or raycasting technique, or you've been eating polar sprouts, and you're frustrated with these graphics accelerator cards and consoles that are very good at doing certain things, but they're not quite what you really want. You don't want your game to look just like all the other games, with all the same effects all done the same way. You've got this mad vision, this wild algorithm that you're just burning to implement, and BIPS of hardware acceleration are no good at all if you can't harness that power to your own ends. You probably think polygons are past it. You want to try out some cool new technique that the hardware-assisted systems just flat-out can't do. Maybe you think our APIs are all a pile of wank and you want to show us how it's really done :-).
Well, if you're that kind of a coder, if you're a true hacker, then you're going to love Merlin. Because in Merlin we have an architecture that does not constrain you to any one way of doing things. Sure, load up the appropriate API and we can do polygons and sprites and all the rest of it, but if you can't get that killer effect or special technique from the APIs, you can write the code yourself - and unleash the awesome power of four VLIW processors tanking along in parallel to do precisely your bidding. Believe me, when you get into it, you'll love it, and you'll be totally gobsmacked at just how much amazingly cool stuff you can get up to.
Best of all, it really isn't very difficult to do. If you're already an assembler coder, you'll find that it's actually a piece of piss. Our variable-length VLIW architecture is considerably easier and more efficient to program than most exotic architectures. You can write the bulk of your setup and outer loop code just as you would on any other processor, using an instruction-set that's really rather nice. If you then take the trouble to learn how to polish and optimise that all-important inner loop, then you will be rewarded with code that just does an insane amount of stuff in not very much time at all.
Of course, when you're sat there with a new machine to learn, it's helpful to have some example code to pull apart, something that you can tweak and twiddle, something that actually assembles, loads and runs straight away, and which gets you up and running with something on the screen immediately. The aim of this document and these examples is to provide you with a worked example of creating a nice effect on the MPE, from initial coding to final optimisation, and to eplain to you what is going on, and why we're doing it.
So, here goes....
Okay, you may be thinking, here we are on this weird processor, we've got bugger all instruction RAM, bugger all data RAM, and we can't even get directly at the display RAM. How the hell are we going to have any fun with that?
Well, we can have a hell of a lot of fun. Sure, the MPE is a little bit weird if you're used to massive flat address spaces and sixteen gazillion addressing modes, but for a competent assembler hacker, it's really not that bad, and nowhere near as weird as it might seem at first glance. And you'll totally get off on the amazing speed with which you can pull off outrageously cool stuff. Trust me. You're gonna love it.
So what are we gonna do?
There comes a time when you want to lay down a screen background
that's just outrageously psychedelic, technically impressive, extremely
pretty, and which impresses the punters no end and just screams "this
ain't no Sega Saturn, matey". You might want to put such a display
behind a title screen or a hi-score table, or even have it as a funky background
behind a game screen if you really want to trip people out. So it would
be nice to have a trippy warping pattern generator that:
In designing anything graphical on the MPE, it behooves one to
remember Merlin Commandment Numbers One and Two, namely
When you do go out to DMA, you want to make it a nice, big, hearty DMA. The optimal DMA size is 64 32-bit pixels. It's no good doing a foontling couple of pixels here and a couple there - you spend way too much time waiting for bus grant as opposed to actually writing pixels. This is going to be a full-screen effect, so ideally we want to be generating 64 output pixels in a buffer in on-chip RAM, then hurling them out the DMA all at once when necessary.
So the first thing we are going to need is a 64-pixel buffer for collecting up output pixels in local RAM.
Those output pixels have to come from somewhere... in order to generate the pattern, what are we gonna do? I intend to use an extension of the old blitting formula - maintain two bilinear address generators, and as we traverse the destination rectangle with one AG, we will also traverse a source rectangle containing pattern data with another, in a cool and interesting manner.
So, for my source rectangle, I am going to specify an 8x8-pixel buffer in local RAM.
Now, you may think that an 8x8 buffer is way too small to yield any interesting or cool patterns, but you're wrong, you wait and see. There are various reasons for choosing that size. It doesn't take up a huge amount of my precious local RAM, for one. And if I were to one day extend my warper to, say, do arbitrary warping of images of any size, then that size buffer would be ideal as cache RAM, as there is room enough in RAM to declare more than one for double-buffering purposes, and again, it's 64 pixels big, the optimal size for DMA.
Also, using a rectangle that is a power-of-2 on a side is good, because we can use the XY_TILE and UV_TILE functions of the bilinear addressing modes on the MPE to constrain the source address generator to within the source tile.
Right. So the basis of this effect is going to be that we are gonna walk over the destination rectangle, and as we do, we are going to traverse a source tile in a manner that is interesting, picking up pixels for output as we go, and maybe doing something interesting to them along the way. Time to write some code.
jmp next rts nop nop