Brainfuck CPU
You can access the design files for this project here. They’re not very good, though (I didn’t exactly know Verilog when I started…)
In Spring 2023, I had a very exciting opportunity: CMU began offerring a new tapeout course, with minimal prerequisites, which would give me the chance to design and tape out my own chip! There were just two problems though: I didn’t know Verilog yet, and I didn’t really know what I wanted to make.
I was taking Intro to Digital Systems at the same time (which would later be made a prerequisite for the course), which would teach me Verilog, but the question of what chip to create was more elusive. After much waffling over potentially making an ASIC version of the computer Ben Eater builds in his 8-bit Computer series, I eventually decided that it would be more interesting to create a new design of my own, and in search of some architecture that would be simple enough for me to implement with my then-limited knowledge of computer architecture, and also small enough to fit in the small chip area given to us by the course, I eventually settled upon Brainfuck.
For those who don’t know what Brainfuck is, it’s probably the most notable
example of an "esoteric" programming language - a programming language not
designed to be useful, but to be somehow weird or funny or both. Brainfuck
programs consist only of 8 different single-character commands +-<>.,[]
,
which allow for doing "arithmetic" (only incrementing and decrementing by 1),
looking at different bytes of memory (only moving by 1 at a time), printing and
reading single bytes from I/O, and looping. There’s no way to do normal
arithmetic or do any other kind of construct other than through these 8
commands… so any programs end up looking pretty incomprehensible, and doing
anything interesting is difficult. However, one thing that is simple about
Brainfuck is implementation of the language itself - I realized that while in
the realm of programming languages we refer to Brainfuck’s operations as
"commands", but they could easily be equivalently imagined as instructions for
a super duper garbage CPU. So that’s what I set out to implement, armed with
around half a semester of Verilog by this point.
The design I ended up creating that semester was mainly inspired by Ben Eater’s
computer, mixed with the simple "RISC-240" CPU from Intro to Digital Systems.
It’s heavily FSM-driven, and is extremely clock-inefficient (each memory
operation including fetches take 4 cycles, and the ]
instruction takes
something like 8 cycles at minimum not counting memory)
And, well, it worked! The course thankfully required us to do an amount of verification on our designs before we taped out, and when I finally received my chip more than a year later (man….) I was able to run all the same test programs as I did on FPGA!
Future work ….?
As of writing, I made this project 2.5 years ago. Since then, I’ve taken the entire main computer architecture curriculum at CMU and done two internships in CPU design, so I have Quite a Lot more knowledge than when I did this project. As such, I want to return to this at some point… and I have some plans to make it much, much better (faster mainly):
The biggest observation to take from Brainfuck is that it has extremely low information density… simple operations like moving data around and addition take many, many commands, and those commands get repeated more and more when you have i.e. data all over the place in memory. In addition, each of those extremely repeated instructions does a trivial operation that can easily be combined with its neighbors. Therefore, just like how modern CPUs translate from ISA instructions into micro-ops which perform simpler operations, I can translate from Brainfuck to some intermediate representation which combine the operations of many Brainfuck commands into one, but which can still execute quickly.
It’s not just "combining repeated operations" though - there’s a specific pattern that’s very common in Brainfuck consisting of a loop with a single decrement on the loop condition and some number of moves and increments/decrements on other cells, but arriving back to the same cell at the end of the loop. This can just be fused into a set of additions! There’s certainly more patterns like this, but this is the first one I’ve looked into closely.
There’s lots of other more traditional improvements to make other than this, of course: switching from a micro-coded/FSM-driven design to a single-cycle or pipeline design would bring a big boost, and adding a cache would (like in real CPUs) solve the memory access latency issue.