Retrochallenge 2022/10

For the Retrochallenge 2022/10, I am continuing the work I left off in RC 2018 and 2019: The CAT-644 microcomputer. I have not touched this project since 2019, other than to move the box it is stored in.

Retrochallenge main page

RC 2022/10 Index

Cat-644 Hardware

The Cat-644 is a computer I've been developing around an Atmega 644 microcontroller

Started in 2013 - This might become a vintage computer before I complete it!
Atmega 644, 20Mhz, 4k internal SRAM (possibly upgradable to Atmega 1284 with 16k SRAM)
128k bitbanged SRAM: used as VRAM and XRAM
VGA output, maximum of 64 colors at 512x240, software cycle-counted race-the-beam
PS/2 keyboard
RS-232 serial port
SD Card
11 Khz mono 8-bit sound output
expandable SPI bus: some experiments have been done with SPI-based ethernet shield originally intended for arduino
It has been 'hardware complete' for a long time, and I have test programs for each piece of hardware, and combinations of hardware. (VGA + sound + keyboard all together was hard)
In previous retrochallenges, I started writing an OS for this thing, in a mix C and Assembly.

Up and Running 10/09/2022

I took the computer board out of storage, and hooked it up to a screen and keybord. It works just like how I put it away... with one issue. Look closely, the left side of the screen is cutoff...

When I was developing the lastest version of the VGA library, I was connected to a full-size monitor, not this mini one. And I guess the mini one is more strict about timings... my other monitors, even one that was an older 4:3 LCD, adjusted to the timing just fine. From the looks of its, it is almost 2 characters wide... about 16 pixels, so that is 32 clocks. I need almost 32 clocks between H-sync and pixels for this monitor... I'm looking at the assembly code to see what options there are.

Of course, when I put it on my nice big monitor, it looks perfect.

After some turning-it-off-and-on-again, the tiny monitor syncs fine now, and I can't even get it to sync poorly. I have nothing to fix. I think maybe it wasn't connecting well, because I couldn't get it to sync properly the first time, but after unplugging and replugging to the big monitor, it sync fine... on both monitors. I'm gonna blame the breadboard connections for now. (Or the mechanical elves/faeries fixed it when I wasn't looking.)

Next Steps

I think I want to rebuild and flash the software I have now... just to make sure I'm starting from the right place. And then, I have to put this thing in a physical case... A nice box with good connections. Then, this thing can sit on my desk, plugged into my KVM like a 'real' computer, and I can work more on the OS.

Midterm Update 10/15/2022

So, since the computer ended up not needing any adjustments to the VGA code at all, the next step was getting my development environment running again.

Serial Communication: Logging and Bootloader I tried connecting the CAT644 up to my Windows computer (which runs Atmel Studio), but I couldn't the serial connection to work... turns out a wire broke in the connector... soldering iron time. Fixed it all up. I use Hyperterminal in Windows... and I was back to being able to see the debugging log come through on the serial port. I also tried out the bootloader, and it said it was connecting OK. I use Chip45 for my bootloader, and I highly recommend it if you are developing for AVR in a Windows environment. This allows me to reflash the AVR through the actual DB9 serial port, instead of using the ISP pins, or an Arduino bootloader through USB.

Rebuilding Code:I have to make sure the OS still compiles from where I left off a few years ago... and it does! The new build does everything the old build did, but has today's date in the startup screen.

Rebuilding Assembler: The CAT-644 is meant to ultimately run user-written programs... In a previous retrochallenge, I wrote a high performance 16-bit interpreter in AVR assembly langauge. This bytecode interpreter needed an assembler, so I needed to make sure it still worked. I had lost whatever script or bat file actually runs the assembler, so it took a little time looking at the assembler's source code trying to remember how to run it, what command line parameters it took, etc. I got it to asseemble a slightly different program and include it in a new build of the OS. (The OS at boot loads a bytecode program from ROM... currenlty it's a simple Hello World program that also takes some keyboard input.)

Now that the hardware and development environment are back to working, it's time to code up some new features for this OS:

Block device support: Previous work on the SDCard was pretty simple: Write a driver that can initialize the card, and then read or write a given block. That's it. There is no filesystem. I don't think I want a completely traditional filesystem either. I'm currently thinking of starting with a simple block allocator which can allocate or free individual blocks. Then on top of that, an API that lets applications create LISP-like data structures... trees and lists... out of those blocks. A 'file' wouldn't be a linear stream of bytes, but instead a completly custom data structure that is defined by the application. The filesystem is more like a RAM heap with the granularity of a block size.

Executable handles: Previous work also created a handle-based memory allocator for the bitbang'ed external RAM. Data can be swapped in and out of the AVR's internal SRAM and the external SRAM using a grab/release API on handles, which swaps in and out data, giving the user a potentially new pointed each time. This allows much larger data structures (64K external RAM heap) than the internal (4K) RAM that AVR contains. The bytecode program is interpreted out of SRAM... and I want the code structure to also be able to be bigger than the interal SRAM. This will require adding in either an extra instruction or a syscall, to 'call' code, not with a real SRAM address, but instead call a function by handle. This would trigger swapping of code in and out of external memory, if it isn't already internal.

Graphics Support: The VGA scanline driver took a lot of work previously, and I probably won't be touching it for a while. But, apart from drawing characters to the screen using the VGA driver's chardevice interface, there is no real graphics support. I did write a draw_sprite function a few years ago when I was testing out graphics... I am going to expose some basic graphics functionality through syscalls. I want to pass the OS an array of sprite parameters. The bytecode application will fill out an array with parameters for sprites to draw, and then a syscall will draw several sprites at once. An actual game might be a little challenging because there currently isn't any double-buffering support, so screen updates have to either happen during vertical blanking, or drawing of sprites race-the-beam. I think I do want a double-buffered display at some point, but that will require using more memory or lowering the resolution... for many games half resolution would be fine... the old Game Boy is only 160x144, and 256x120 or 128x240 is more pixels than that, and I have more colors. Modifying the graphics driver to do 1/2 resolution in either vertical or horizontal direction without changing any of the instruction timings would be very easy, and is one of the few changes to the VGA driver I wouldn't mind doing.

Filesystem Alternative

This project provided an opportunity to explore alternatives to the traditional filesystem. I came up with a tree structure that closely resembles Lists as they exist in Lisp or Scheme. LLFS (Linked List File System) is an on-disk data structure that is not explicitly organized into directories.

The primitive datatype in LLFS is a block. A block contains an optional data bytes, and optional next block and childFirst blocks.


typedef struct blockHeaderS{
         blocknum        next;           //next block at this level
         blocknum        childFirst;     //first block on next level
         uint16_t        used;           //number of bytes used in this block
         uint8_t         flags;          //not deviced what is needed yet
         uint8_t         datastart;      //offset into block to the data itself
 } blockHeader;

This method makes creating complex data structures on disk similar to creating linked lists or trees in memory. The system has a root block which everything else is, directly or indirectly, a child of. This is not implemented yet, but in the future, the system main process will keep a data structure that divides the system into different subtrees. All executable programs will be on one branch of the tree, and the data format for a program will include names. Each program then has a subtree of its own, containing that program's data. Such as user-data 'files' that have been created. Each program will probably also name at least some of their data structures... I may make 'name' a common, optional field. But, within what would normally be considered a 'file' on a normal filesystem, there can both be data bytes, AND a collection of child subtrees. When creating large, editable structures on disk, there are certain advantages to this. A list of child blocks can have blocks inserted into the middle of a file. If a file has sections for different types of data, each type can grow/append independently without rewriting the whole file. I suppose this isn't that different than having a folder of several files on a regular filesystem, however in this system it is kinda treated as one 'big' multiforked file. Files that are a common data format (like text files, etc), will probably be kept in common areas.

Overview of LLFS API

To interact with the filesystem, a blockInfo structure must be used. It is used to track the current position in the tree that is being explored. There can be multiple blockInfo structures in use at once, but if two of them are editing the same block, results are currently undefined. In the future, there could be enforcement of this rule, especially if blockInfos are allocated (and thus trackable) by the OS itself.

blockRoot(&pos); - Initializes pos to the root block

blockAddFirstChild(&pos, void* data, int len); - Creates a child block under the current block, at the front of the list, pre-pending any data blocks already there. Pos is updated to point to the child block.

blockInsert(&pos, void* data, int len); - Creates a new block after the current block. If the current block is in the middle of a list, this inserts into the list, or if it at the end of a list, it appends to the end of the list. Pos is updated to point to the new block... repeated blockInsert commands continue to grow the list.

blockWriteData(&pos, void* data, int len); -Writes new data into an existing block, replacing what is there. This does not change any next/child pointers and does not alter the connectivity of the structure.

blockAppendData(&pos, void* data, int len); - Appends new data to the data section of an existing block. If the current block is full, a new block is created after the current block. Pos is updated to point to the new block, if necessary. This is the closest to writing a traditional, linear file.

blockNext(&pos); //advances pos to point to the next block (To traverse list structures)

blockFirstChild(&pos); //makes pos point to pos's first child (to traverse tree structures)

uint16_t blockRead(&pos, void* data, uint16_t maxlen); Read a block's data section. Return value is number of bytes stored in the block. Up to maxlen bytes will be written to the passed pointer.

In the future, there will be APIs to delete and move blocks.

Advantages of a list-based filesystem

I will use the example of a text file to show cases that are ineffecient or difficult on standard filesystem.

Blocks do not need to be full. Deleting a few bytes in the middle of a text file will just shrink a block to be less full. Deleting a lot of bytes in a text file will also just remove blocks from the file's block list.
Large pieces of files are easily moved. A large block of text can be cut from one file, and then inserted into another, and it only involves swapping some 'next' pointers and inserting or shrinking blocks to deal with the boundaries. 100MB could be moved from one file to another with no problem. Files can also be split and combined in a similar manner
Files can be truncated from either end. A pipe-like FIFO or queue can be implemented on disk.

Sample Intended Use

	//assuming 'pos' is already pointing to some block
	//and that block has a list of strings I just want to print like I'm reading a linear file

	if (blockFirstChild(&pos)){
		do{
			int len = blockRead(&pos, myString, sizeof(myString));
			printf("%.*s", len, myString);
		} while (blockNext(&pos));
	}

Limitations

Random Access Finding the end of a 'file' to append to it, could be a potentially long process... O(n). Appending is a common operation, so having a childLast field in a block would make that fast. But, that also means every block appended to the end of a file will result in a write to the parent as well... For now linear files and trees should be enough. Random access to data could be accelerated by keeping the data in a balanced tree.

Wear Leveling No attempts have been made to wear-level the block device. The assumption is the SDCard has its own OK wear-leveling. I plan to add a 'revision' field to blockHeaders, so that when a new version of a block is to be written, instead it is written to a new block, and then the revision number in the previous block is updated. Any low level call to read the older block will notice the revision number and follow the chain of revisions until the newest version is found. This means each block is written only twice... first time, and second time to refer to a new version. There will have to be a pruning process to update next and child pointers when these chains become too long.

Potential Ineffecient use of space Since applications allocate blocks as they want, the OS has no control of how 'full' they are. Simple linear 'files' that are just appended bytes have similar space effeciency of existing filesystems, but if an application had a tree of data objects, and each object is much smaller than a block, a lot of space will be wasted. For this project, I simply don't care. I am using a 1GB SD Card on an 8-bit computer; this is thousands of times larger than a reasonable disk for the 8-bit era. Even if each block only held 1 byte of usable data, there are about 2 million blocks on a card... 2MB is still a big disk when comparing to old floppy disks. In the future, perhaps some of the standard size disk blocks can be broken into multiple virtual smaller blocks.

Reclaiming of free space The current block allocator is very simple: Each time a new block is needed a sequential counter is incremented. In the interest of wear leveling, it makes the most sense to write each block at least once before reusing deleted space. There are a couple ways to implement this. Deleted blocks could just be marked as 'deleted', and then when it is necessary to reuse space, the disk can be scanned for deleted blocks. This could be made faster just by, at delete time, placing the node as a child of a designated garbage node.

Happy Halloween

Today is the last day for the retrochallenge. I ended up having a lot of unexpected things going on this month, but still managed to get out a tangible increment of KittyOS for the Cat-644. This is what got implemented:

Block device support

Previously only character devices, which read and write 1 byte at a time were supported. This included the serial port, SPI bus, VGA drawchar function, and reading the keyboard. Block devices allow reading and writing discrete blocks of data, such as a disk. The SDCard block device driver actually uses the SPI character device to tak to the SDCard. Writing to the disk involves locking the SPI bus for exclusive use for the duration of the write. Theoeretically multiple SPI devices or even multiple SD Cards could be supported, as long as each SPI device driver uses the SPI lock API:

dev_spi0.chardev.ioctl(&dev_spi0, IOCTL_LOCK, SPI_MASTER | SPI_CLK_8); // The SDCard Driver locks the SPI bus, also asking that the bus be put in a particular configuration 

low(SDCARD_CS);  //Activate SD Card

for (...){ 	
	//read/write whatever is needed
	dev_spi0.chardev.write1(&dev_spi0, c); .
}
high(SDCARD_CS); //Deactivate SD Card

dev_spi0.chardev.ioctl(&dev_spi0, IOCTL_UNLOCK, 0);  //unlock the SPI bus.  This returns whatever shared pins are back to what state they were before the SPI session.

Hardware Conflict

The SPI Bus and external memory address lines both use PORTB. The external memory access lines are used for both XRAM usage and drawing the screen. This means two things: 1) SDCard transactions must read/write only internal AVR SRAM, but XRAM. This isn't too bad, since if we are reading or writing something, it is something we need, so it'll probably need to be in internal memory anyway. And if it is for later, copy to XRAM is much faster than the SPI transaction was anyway. 2) If the VGA driver timer goes off during an SDCard transaction, the SDCard driver keeps those bits of PORTB overridden... making the display show some 'garbage' for a few scanlines at a time. This can be mitigated by reading and writing blocks during the vertical blanking interval. This seems limiting, however there is enough time to read or write a whole block or two. This means the maximum speed, without interrupting the display, assuming 1 block per interval, is 512 bytes per frame, or 30,720 KB/sec. This sounds slow, but Commodore 64's Epyx Fast Load only reads at 2500 bytes a second... the original protocol shipped by Commodore is only 300 bytes/second. I don't consider this a serious limitation to this platform.

Filesystem Alternative

The OS contains functions to read and write linked lists and trees of blocks. A traditional filesystem is not provided, but creation of linear lists of bytes that look like a flat file is a use-case easily supported by this scheme, as well as transformations on these files in ways not possible in standard filesystems: Removing bytes from the middle of a file, directly appending two files together, etc. This OS will be an interesting experiment as it develops further. I feel like it will go two directions; it will either 1) as features are added, eventually end up looking like a traditional filesystem or 2)it is something very different. One thing I want to add is the ability for the handle-based allocator introduced in a previous retrochallenge to have handles that represent blocks on disk. This would allow navigation of disk structures as if they are XRAM objects. This is starting to look like some kind of Virtual Memory scheme.

Filesystem features not finished

The LLFS tree functions are only implemented in C, and are not exposed to the VM interpreter. This would just involve adding calls to the syscall() C function's swich block; I just simply ran out of time. I also want to add block move and delete routines functions. Currently things written in C can create structures on disk, grow them, or edit their data, but not prune.

So, that's the end of my retrochallenge entry. I am still quite excited about this project, and I got to a late start this month, so I'm going to keep working on it and posting updates, but everything that happens after now isn't really meant for the competition; it's just me continuing to play around.

SD Card Bugs

(RC2022/10 Overtime) Nov 7, 2022

I was having trouble getting the SDCard to work reliably... Sometimes it would complete several block read/writes in a row, sometimes it would timeout waiting for a command response... I spent a few days after the retrochallenge looking into it, and found a small bug. This fix is included on the github link... The SD Card CS line was not being properly released after a block transfer. It had been commented out! While previously debugging the SDCard in isolation, with the VGA driver not running, this had been commented out, and causes no problems when the screen isn't being drawn! But... when the display is being refreshed, some 'garbage' was being sent to the SD Card and it was returning error responses. If the garbage contains something that accidently looks like the beginning of a command, the real command got lost in it (as perhaps misframed as an argument to the garbage command)... the SD Card never sees the command sent, so it never responds. After fixing that, the SDCard works perfectly fine now, even with the display being drawn.