In the Nand to Tetris course you have to do many things. It's quite a fun course. In project 8 you implement a compiler from VM code to assembly code. Most of the compiler is straightforward, but there's one part that requires a clever trick, one sufficiently interesting to make me want to write about it.
There are two parts to this puzzle: the assembly language, and the architecture on which it operates. We'll define each in turn before going into what the actual challenge is.
The architecture used in Nand to Tetris is incredibly simple, as you'd expect from something designed for teaching purposes. We won't go into too much detail, as most of it isn't relevant here; all you need to know is that The CPU consists of two registers, named A and D (for address and data), and then off to the side there's the main memory. Pretty simple.
In the assembly language D
refers to the data in the D register and A
refers to the data in the A register. In addition, M
refers to the data being stored in the main memory at the address stored in A (hence the name A, it's an address in the main memory).
On top of this, there are also segments in the main memory which are specially allocated for certain purposes. In order to access these there are special commands that set the A register equal to the address of the area in memory where the base addresses for these regions are stored. A
becomes a pointer to a pointer. For our purposes here we'll just use one of the several segments, called LCL
, the local segment. The special command to access this region is @LCL
. What this does is set the A register equal to the address in memory where the address of the base of the LCL segment is stored. The A register points to the LCL pointer.
The @
commands can also be used to set A equal to arbitrary values, as in @21
sets the A register equal to 21. All the values in this architecture are 16 bit integers, so all the addresses are also just integers. @LCL
is actually just a code for @1
, 1
being the address of the pointer to the base of the local register.
A very special segment is the stack, which we access with the command @SP
, SP meaning stack pointer. Like most stack implementations the pointer points at the next unused piece of memory, so @SP
doesn't make A point at the pointer to the base of the stack, but rather upon the invocation of @SP
A points at the pointer that points just above the top of the stack.
So that's one command. @
followed by a segment makes the A register store the address of the address of the base of the segment, and @
followed by a number sets A equal to that number. The only other type of command is a computation command.
The computation commands are of the form destination = computation; jump
. We won't need the jump part of this command (it's optional anyway), so we'll just look at the destination
and computation
parts. The destination can be A
, M
, D
, or any combination thereof, like MD
or AMD
. The computation
can be things like A + D
or M - 1
or simply 0
. In the compuation you can use the numbers 1
or 0
, the "variables" A
, M
, or D
, and the operations +
, -
, !
(bitwise not), &
(bitwise and), and |
(bitwise or).
That's all a lot of information, so here's an example. The push command pushes the value from a given segment onto the stack. This is how to implement pushing the value stored in LCL[16]
:
@LCL // A = address of the address of the LCL segment D = M // D = address of the LCL segment @16 // A = 16 A = D + A // A = address of LCL + 16 = LCL[16] D = M // D = data in LCL[16] @SP // A = address of the address of the top of the stack A = M // A = address of the top of the stack M = D // top of the stack = D = data in LCL[16] @SP // this line and the next increment the stack pointer M = M + 1
Hopefully with the comments that makes sense. It can be tough in this language to distinguish whether you're looking at the data you want or a pointer to that data (or a pointer to a pointer to that data). If the above code makes sense, then you're ready to tackle the real puzzle. If not, try going back through line by line, or just give up, whatever floats your boat.
We saw above that the push command wasn't so bad. I mean, it took quite a few lines, but this is an assembly language, what're you gonna do? We might assume pop would be a similar level of complexity, but it turns out to be a bit more difficult (it actually ends up being only three more lines than push, but it's conceptually quite trickier).
Where the push command takes data from a given segment and pushes it on the stack, the pop command takes data from the top of the stack and loads it onto somewhere in a given memory segment.
The problem is that with push, we can get the address of the data we want, put the data in D, then just use the A register to look up the stack pointer, set M = D
, and then increment the stack pointer. With pop, think about the general steps we want to do. We might think we would want to
The problem comes between steps 2 and 3. While D stores the data, we can only set A equal to a pointer to the pointer to the base of a segment or an offset. What we need is to simultaneously store the data, the pointer to the pointer, and the offset. This wouldn't be too bad if we could arbitrarily set the D register to a value, but we can't. The @
commands can only be used to set the A register to given values. The D register can only be set to some computation involving A
, M
, D
, 1
, or 0
.
We could try finding some part of memory to store the temporary data, but then we end up back where we started, trying to juggle the address of that memory. No, we need to store multiple pieces of information on a single register.
To get your mind in the right mood, consider this C function for swapping two integers without using a temporary variable:
void swap(int* x, int* y) { *x = *x + *y; *y = *x - *y; *x = *x - *y; }
Now, this code runs the risk of overflow, and in practice you'd want to use an xor, and in real practice you'd just write it with a temp variable and trust the compiler to do the optimization, but the idea of this code is what we'll be using. The idea that we can use a variable to hold two pieces of information at once, and then access each piece of information by using the proper operation.
Regardless, we can still do the first two proposed steps of decrementing the stack pointer and loading the data into D fairly easily. That code looks like this:
@SP // access stack pointer M = M - 1 // decrement stack pointer A = M // A points to the top of the stack D = M // D = value at top of stack (pop value)
Going from here is where it gets a little tricky. To get going we might try the trick from that C code. We'll add the address where we want to pop to to the D register:
@LCL D = D + M // D = pop value + base address of LCL @16 D = D + A // D = pop value + address of LCL[16]
Now we're getting somewhere. Is it where we want to go? Only time will tell.
What we want now is for the A register to have the address of LCL[16]
so that we can move the data into memory at that location. What we'd normally do is set it equal to 16, store 16 in D, then set A to LCL and add the value in D (which would be 16) to A. The problem with doing that now is that we'd disturb the data in D, which we can't afford to do. What we're going to do instead is set A equal not to the address we're popping to, but rather the data to be popped:
@SP // A = address of stack pointer A = M // A = stack pointer A = M // A = pop value
We do this because it allows us to set A to the proper address by using the data in D:
A = D - A // A = pop value + address of LCL[16] - pop value
At this point A equals the proper address and D equals the sum of that address and the data we want to enter at that address. We're almost there. All we want is for the place in memory where A points to to equal the data. We can't just do M = D
, because then the place in memory would be the aforementioned sum. What we want is that sum minus the part we don't want. And what we don't want is the address, which is stored in A. So we want something like this:
M = D - A // M = pop value
Which is our solution! It's almost amazing how everything just fits together perfectly on this line. The A register is simultaneously controlling where M
references and enabling us to get the right data from D
, and D
, in turn, is what allowed us in the previous line to load the proper address into A
. It's all beautifully self-referential.
All together it looks like this:
@SP M = M - 1 A = M D = M // D = pop value @LCL D = D + M // D = pop value + base memory address @16 D = D + A // D = pop value + address to pop to @SP A = M A = M // A = pop value A = D - A // A = address to pop to M = D - A // M = pop value
And that's how we implement the pop command.