FamiCube PPU Dev

FC3 PPU Dev

July 2017

It's been a while since I last worked on the Famicube PPU. Looking at the old code, some of it was written just to make things go, and I had also begun writing a windowed GUI.

Not-Workbench. These 1-bit font tables might see some changes but they'll do for now. This old version uses nibbles rather than bitplanes, which means 8 VRAM reads per tile width instead of 3, though only a (read and) write to change a pixel. Tiles are not good for pixel level updates so I decided to scrap this format and go for bitplanes. The way I rendred tiles onto the pixmap would not have worked for video signal generation either, and had to go.

Mode 2 (Screen/Pixel/Nibble)

Refactoring the core of the PPU took some careful poking and prodding around the perimeter of the fortress. It's easy to mess up and simply abandon a project in frustration if being reckless crossing the point of no return. I tore out the tiled mode and windowed GUI and implemented 4-bit (16 colors) screen mode as a simple replacement to test my new video output structure. Sprites not implemented anywhere yet. Perhaps this mode is more blitter oriented? Nice to have support for a non-destructive mouse cursor though.

Turns out the Chaos Angel crab girl is 16 colours. The upper-left palette strips are for my 24-bit to 4-bit nibble converter (a byte is two 16 colour pixels), as it needs reference for what colour is what index. The converted data file is then just injected into VRAM and the PPU draws whatever is there.

A mode like this is good for games like Lemmings or Elite, though I don't know if VRAM access can be made fast enough (blit/burst/stream port?), or whether I'll be using VRAM. My core idea is that this system was scaled up over the years (CPU, RAM), but the PPU remains a hard-limit ASIC that can't be upgraded further. It helps to define the look and feel of the system.

After a detour, I went for 640x480 for output and not 16:9. Many TVs should be able to take VGA and go into 4:3. My actual resolution is 320x240 but that might blur in an unnatural fashion when upscaled. I'm thinking 2x pixels will be a bit more crisp looking.

I've thought a bit about this. Back in the 80's and 90's, the pixel jaggies seen in low resolutions were an obstacle to overcome using anti-aliasing and dithering to soften curves and gradients. Pixel art now is a style deliberately chosen, but our modern displays make jaggies stand out even more, like buzz-saw edges. This has lead to a style with more square, flat characters. AA has also fallen out of grace as single pixels become detail-like when very crisp. We already have higher resolutions, so less incentive to try to smoothen and blur a low resolution with AA.

Some 15-5 years ago, blurred pixels and fake scanlines bothered me. I wanted crisp output, even from older hardware. When pixel art get scaled up just a little it looks bad so that position made sense back then, but now we have mega resolution monitors and it's hard to see the scaling artefacts at times. Some TVs are not so good at upscaling lowrez video though. Anyways, it sort of easy to see that a lot of pixel art from the 80s were made for older displays.

Quartet, Gryzor, Space Harrier and Alien syndrome in MAME, all having a foggy pastel look. The gamma curve and black-points are completely off. The artist likely saw darker, more saturated colors, but now the black is really distanced from the rest. A bold S curve and scanline overlay (to bring darks closer together) could perhaps fix it.

Chaos Angels (1989) on the MSX 2, in dithered tall-pixels. Dither effects (especially on the PC-88) can easily become an eyesore on modern displays. I borrowed this character for one of my many Famicube port/mockups, wanting to see if it worked in square pixels and with my palette.

There are more examples with more subtle effects I could list, but my conclusion was to try scanlines for this. It implements well as I have 64 colors, allowing me to make a 64*4=256 colour scanline effect palette with hand-tweaked values (though haphazard at the moment). An plain overlay shading effect tends to muddy colours.

The PPU now uses a single scanline buffer (320+8 bytes (not in VRAM)) onto which 0-63 color values are rendered by the mode 0-3 renderers. This buffer is then referenced twice as it is streamed out to the pixmap in a linear fashion (640*2 pixels). But wait, am I wasting the last two bits with only 64 colours? Not quite - they are used for the scanline shading.

I've experimented a bit with moving every other scanline sideways each frame (half a 2x pixel) to create a blur effect that makes dithering look a bit better. Sort of works, but disabled for now.

The PPU will be able to change mode and/or mode parameters after a full scanline, allowing for some fun effects. The scanline buffer being over-provisioned simplifies bounds checking when rendering tiles at the subtile offsets needed for smooth scrolling. I can simply render the tiles onto the buffer and scroll by reading from a different point on the buffer later. Using a buffer might also make pixel clock timings less complex but the buffer has to be ready on time of course.

Mode 3 (Text6)

A wild text mode draws near. The rainbow effect was a debug test, but interestingly the gradients lined up with the character rows by chance. A similar effect might make it into the final version. This is still the crab girl image, but seen in text mode.

The MSX has a rather neat 6px text mode that I decided to copy. It actually wastes the 2 bits to the right of each character but it's not a huge loss in 1-bit mode. 53.33 x 30 characters can fit in this mode. Maybe the mode should have support for line references to assist text reflow (line removal/insertion? Maybe not, as larger text sources will be stored in regular work memory, not in VRAM.

Currently, there are only 128 characters and characters over 127 are inverted. The colour attribute byte uses 6 bits to select foreground colour, and 2 bits for background selection, but these colours are taken somewhere from the palette table which has 32 colours (32/4 is 8 so I could do copper bars this way).

Scanline registers

The PPU will have the ability to change mode on some scanlines and do splitscreen or info bars like in Zelda 1 or Castlevania. This means that between scanlines the PPU might have to read entires with info for setting up the new mode. It would be a waste to store and read entires for every scanline, so each entry should have a height or target. Also, having to read too large entries might mess with video sync timings (I remember the Amiga had a blank scanline on screen change). Cycles could be saved by making some VRAM locations static. Dynamic locations (tile tables?) and also playfield width&heights can be multiples of e.g. 256.

Mode register	Type	Entry read	Entry store	Mode 0 (Tiles)	Mode 1 (Blank)	Mode 2 (Nibble)	Mode 3 (Text6)
		VRAM->PPU		Range to store (exclusive)
Mode:	val	Byte	Flags for what to read?	4	4	4	4
Mode cursor:	counter	-	Const + Byte	<240	<240	<240	<240
Palette (32+32):	Static address	-	Const	64 colours in 84+84 bytes	-	64 colours in 16 bytes	4
4x Graphics tables:	Dynamic address	Bytes	Shorts	6 or 2KB 256 base?	-	-	1KB 256 base?
Name table:	Dynamic address	Byte	Short	Many KB 256 base?	-	-	54*30 256 base?
Attribute table:	Dynamic address	Byte	Short	Many KB 256 base?	-	-	54*30 256 base?
Playfield Width:	val	Byte	Short?	40-720 tiles	-	160-320 nibbles	-
Playfield Height:	val	Byte	Short?	30-560 tiles	-	<400 lines	-
Scroll Offset:	X&Y baked val (address)	Short	X,Y Shorts?	184030 tiles	-	>30K nibbles	-
Sub-Offset:	val	Byte	X,Y Bytes?	8 pixels	-	2 pixels	-
Mode Target/Height:	val	Byte	Byte	240 scanlines	240 scanlines	240 scanlines	240 scanlines

Tentative, will likely revise as things develop. Mode entries are likely <16 bytes, and stored one after another at a set location. Right after the palette at beginning of VRAM? By using bit flags in the Mode variable, I can perhaps set which registers need to be updated, minimising VRAM access. I suspect most of the time only the scrolling registers will be changed.

Sprites

Currently a bit of a mystery. They'll have to be rendered onto the scanline buffer as columns, and thus need some sort of entires for setting x position for muliplexing (column shearing) and a graphics pointer. Will have to solve off/parial-screen bound checks neatly. Unsure about z-sorting.

Safety concerns

The nice thing with a 64K PPU like this, is that it can't crash as long as I'm careful with using a short (16-bit unsigned integer) for internal addressing. A short will just wrap around if abused. Other nonsense put into VRAM will hopefully just be rendered as glitch graphics. Also, the various modes all render onto the single scanline buffer, which is then sourced by an agnostic video generation function with hardwired bounds.

Engineering

I'm not an engineer, but a project like this is fun when learning.

An R2R resistor ladder might not work at high frequencies... possibly 20mhz/50ns if I output VGA 640*480 (2x pixels). 0.1% tolerance resistors might be required. Routing might be critical... there are issues like deflection, capacitance, crosstalk, but I don't know much about it. There might have to be a buffer on each channel, for stability. After drawing this, I looked into using a ROM as a combinational logic device, but it's hard to find an ideal fit. Smaller ones are serial. Also, ROMs generally have like 16 address lines and 8 data lines, and I need 6 -> 12.

Further investigation, and this solution came to mind (here simplified in terms of I/O pins). I don't know if existing PAL/PLAs are what I need. The address decoder (left) is full range, and the palette values are stored in the matrix intersections (right). Another thing: If I'm to output scan lines, I need a way to dim the outputs, though perhaps this can be a single line intercepting post R2R and... doing something.

I don't mind having basic 74 series logic chips on board, but 64 colours will require a lot of gates. 16 colours (4-bit input) and 3+3+3 bit (output) might be more feasible for something like an arduino game system on perfboard. If I were to guess... 4x NOT gates (1 IC), 16x quad-input AND gates (8 ICs), then the amount of intersections (bits set) in the colour definition matrix determines OR gate count... 3+ bits set per colour on average is... 16*3 = 48 so quad-input OR gate ICs makes that /8 then. 6-8 OR gate ICs. 17 total.. well, a sizeable graphics board then.

Cheap 80's computers without a graphics chip often didn't use hand-tuned palettes, instead choosing to output colours which were full on, sometimes dimmed. It's possible to send something like 8 bits direct to a DAC and get a 3+3+2 bit (RGB) 256 colour colour space, but it's a bit of a waste as many colours are not usable/subtle.

Oh, right. The palette will lose a little accuracy truncated to 4+4+4 bit. The Amiga had 16 step RGB sliders and it looked reasonably good. It's not quite trivial to convert the 8+8+8 bit palette, as there are several viable slider options for matching colours to the original. I ended up with like 4 pages of code just to generate these three versions (original version 8ra included over each). The brown is having some difficulties. This is not the final palette, I will perhaps rearrange it a bit into 8-aligned copper bars/ramps, and use that brightest pastel green for something fresh.

Code/Art by Arne Niklas Jansson

AndroidArts.com