Retrochallenge 2022/10

Filesystem Alternative

This project provided an opportunity to explore alternatives to the traditional filesystem. I came up with a tree structure that closely resembles Lists as they exist in Lisp or Scheme. LLFS (Linked List File System) is an on-disk data structure that is not explicitly organized into directories.

The primitive datatype in LLFS is a block. A block contains an optional data bytes, and optional next block and childFirst blocks.


typedef struct blockHeaderS{
         blocknum        next;           //next block at this level
         blocknum        childFirst;     //first block on next level
         uint16_t        used;           //number of bytes used in this block
         uint8_t         flags;          //not deviced what is needed yet
         uint8_t         datastart;      //offset into block to the data itself
 } blockHeader;

This method makes creating complex data structures on disk similar to creating linked lists or trees in memory. The system has a root block which everything else is, directly or indirectly, a child of. This is not implemented yet, but in the future, the system main process will keep a data structure that divides the system into different subtrees. All executable programs will be on one branch of the tree, and the data format for a program will include names. Each program then has a subtree of its own, containing that program's data. Such as user-data 'files' that have been created. Each program will probably also name at least some of their data structures... I may make 'name' a common, optional field. But, within what would normally be considered a 'file' on a normal filesystem, there can both be data bytes, AND a collection of child subtrees. When creating large, editable structures on disk, there are certain advantages to this. A list of child blocks can have blocks inserted into the middle of a file. If a file has sections for different types of data, each type can grow/append independently without rewriting the whole file. I suppose this isn't that different than having a folder of several files on a regular filesystem, however in this system it is kinda treated as one 'big' multiforked file. Files that are a common data format (like text files, etc), will probably be kept in common areas.

Overview of LLFS API

To interact with the filesystem, a blockInfo structure must be used. It is used to track the current position in the tree that is being explored. There can be multiple blockInfo structures in use at once, but if two of them are editing the same block, results are currently undefined. In the future, there could be enforcement of this rule, especially if blockInfos are allocated (and thus trackable) by the OS itself.

blockRoot(&pos); - Initializes pos to the root block

blockAddFirstChild(&pos, void* data, int len); - Creates a child block under the current block, at the front of the list, pre-pending any data blocks already there. Pos is updated to point to the child block.

blockInsert(&pos, void* data, int len); - Creates a new block after the current block. If the current block is in the middle of a list, this inserts into the list, or if it at the end of a list, it appends to the end of the list. Pos is updated to point to the new block... repeated blockInsert commands continue to grow the list.

blockWriteData(&pos, void* data, int len); -Writes new data into an existing block, replacing what is there. This does not change any next/child pointers and does not alter the connectivity of the structure.

blockAppendData(&pos, void* data, int len); - Appends new data to the data section of an existing block. If the current block is full, a new block is created after the current block. Pos is updated to point to the new block, if necessary. This is the closest to writing a traditional, linear file.

blockNext(&pos); //advances pos to point to the next block (To traverse list structures)

blockFirstChild(&pos); //makes pos point to pos's first child (to traverse tree structures)

uint16_t blockRead(&pos, void* data, uint16_t maxlen); Read a block's data section. Return value is number of bytes stored in the block. Up to maxlen bytes will be written to the passed pointer.

In the future, there will be APIs to delete and move blocks.

Advantages of a list-based filesystem

I will use the example of a text file to show cases that are ineffecient or difficult on standard filesystem.

Blocks do not need to be full. Deleting a few bytes in the middle of a text file will just shrink a block to be less full. Deleting a lot of bytes in a text file will also just remove blocks from the file's block list.
Large pieces of files are easily moved. A large block of text can be cut from one file, and then inserted into another, and it only involves swapping some 'next' pointers and inserting or shrinking blocks to deal with the boundaries. 100MB could be moved from one file to another with no problem. Files can also be split and combined in a similar manner
Files can be truncated from either end. A pipe-like FIFO or queue can be implemented on disk.

Sample Intended Use

	//assuming 'pos' is already pointing to some block
	//and that block has a list of strings I just want to print like I'm reading a linear file

	if (blockFirstChild(&pos)){
		do{
			int len = blockRead(&pos, myString, sizeof(myString));
			printf("%.*s", len, myString);
		} while (blockNext(&pos));
	}

Limitations

Random Access Finding the end of a 'file' to append to it, could be a potentially long process... O(n). Appending is a common operation, so having a childLast field in a block would make that fast. But, that also means every block appended to the end of a file will result in a write to the parent as well... For now linear files and trees should be enough. Random access to data could be accelerated by keeping the data in a balanced tree.

Wear Leveling No attempts have been made to wear-level the block device. The assumption is the SDCard has its own OK wear-leveling. I plan to add a 'revision' field to blockHeaders, so that when a new version of a block is to be written, instead it is written to a new block, and then the revision number in the previous block is updated. Any low level call to read the older block will notice the revision number and follow the chain of revisions until the newest version is found. This means each block is written only twice... first time, and second time to refer to a new version. There will have to be a pruning process to update next and child pointers when these chains become too long.

Potential Ineffecient use of space Since applications allocate blocks as they want, the OS has no control of how 'full' they are. Simple linear 'files' that are just appended bytes have similar space effeciency of existing filesystems, but if an application had a tree of data objects, and each object is much smaller than a block, a lot of space will be wasted. For this project, I simply don't care. I am using a 1GB SD Card on an 8-bit computer; this is thousands of times larger than a reasonable disk for the 8-bit era. Even if each block only held 1 byte of usable data, there are about 2 million blocks on a card... 2MB is still a big disk when comparing to old floppy disks. In the future, perhaps some of the standard size disk blocks can be broken into multiple virtual smaller blocks.

Reclaiming of free space The current block allocator is very simple: Each time a new block is needed a sequential counter is incremented. In the interest of wear leveling, it makes the most sense to write each block at least once before reusing deleted space. There are a couple ways to implement this. Deleted blocks could just be marked as 'deleted', and then when it is necessary to reuse space, the disk can be scanned for deleted blocks. This could be made faster just by, at delete time, placing the node as a child of a designated garbage node.