To assemble and run this example, use the batch file "m4" in the Warpcode directory.
Here's the pixel-generation loop, after having spent just a few minutes doing 'obvious' packing of instructions:
pixel_gen: ; This is the pixel generation function. It collects *bilerped* pixels from the 8x8 pattern buffer and ; deposits them in the linear destination buffer for output to external RAM. st_s x,ru ;Initialise bilinear X pointer st_s y,rv ;Initialise bilinear Y pointer ; Here is the bilerp part. { ld_p (uv),pixel ;Grab a pixel from the source addr #1,ru ;go to next horiz pixel } { ld_p (uv),pixel2 ;Get a second pixel addr #1,rv ;go to next vert pixel } { ld_p (uv),pixel4 ;get a third pixel addr #-1,ru ;go to prev horizontal pixel } { ld_p (uv),pixel3 ;get a fourth pixel addr #-1,rv ;go back to original pixel sub_sv pixel,pixel2 ;make vector between first 2 pixels }Here we have just folded the addr instructions in with the pixel loads, and we start the first ALU poeration also as soon as the first pixel pair is loaded.
{ dec rc0 ;Decrement the counter mul_p ru,pixel2,>>#14,pixel2 ;scale according to fractional part of ru add yi,y ;Add the y_increment } sub_sv pixel3,pixel4 ;make vector between second 2 pixels { mul_p ru,pixel4,>>#14,pixel4 ;scale according to fractional part of ru add xi,x ;Add the x-increment }Here you can see that we have folded some more instructions together, but look what happens next:
add_sv pixel2,pixel ;get first intermediate pixel add_sv pixel4,pixel3 ;get second intermediate pixel sub_sv pixel,pixel3 ;get vector to final valueThere is a big pile-up of ALU instructions here that we have to get through before we can start the final multiply to get the result.
mul_p rv,pixel3,>>#14,pixel3 ;scale with fractional part of rv bra c0ne,pixel_gen ;Loop for the length of the dest buffer add_sv pixel3,pixel ;make final pixel value { st_p pixel,(xy) ;Deposit the pixel in the dest buffer addr #1,rx ;increment the dest buffer pointer }And finally we are done, for a final inner loop count of 16 ticks per pixel. Now this is still a lot better than the original, naive coding of the bilerp, but it's not nearly as good as we might hope. Next up, we'll do some serious optimisation.
jmp next jmp prev rts nop nop