Intel introduced the 8086 microprocessor in 1978 and it had a huge influence on computing.
I’m reverse-engineering the
8086 by examining the circuitry on its silicon die and in this blog post I take a look at how conditional jumps
are implemented.
Conditional jumps are an important part of any instruction set, changing the flow of execution based on
a condition.
Although this instruction may seem simple, it involves many parts of the CPU:
the 8086 uses microcode along with special-purpose condition logic.

The die photo below shows the 8086 microprocessor under a microscope.
The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath.
Around the edges of the die, bond wires connect pads to the chip’s 40 external pins.
I’ve labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below.
Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below.
The BIU handles memory accesses, while the Execution Unit (EU) executes instructions.
Most of the relevant circuitry is in the Execution Unit, such as the condition evaluation circuitry near the center,
and the microcode in the lower right. But the Bus Interface Unit plays a part too, holding and modifying the program counter.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip’s single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Microcode

Most people think of machine instructions as the basic steps that a computer performs.
However, many processors (including the 8086) have another layer of software underneath: microcode.
One of the hardest parts of computer design is creating the control logic that directs the processor for each step of an instruction.
The straightforward approach is to build a circuit from flip-flops and gates that moves through the various steps and generates the control signals.
However, this circuitry is complicated, error-prone, and hard to design.

The alternative is microcode: instead of building the control circuitry from complex logic gates, the control logic is largely replaced with code.
To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode.
In other words, microcode forms another layer between the machine instructions and the hardware.
The main advantage of microcode is that it turns design of control circuitry into a programming task instead of a difficult logic design task.

The 8086 uses a hybrid approach: although the 8086 uses microcode, much of the instruction functionality is implemented with gate logic.
This approach removed duplication from the microcode and kept the microcode small enough for 1978 technology.
In a sense, the microcode is parameterized.
For instance, the microcode can specify a generic Arithmetic/Logic Unit (ALU) operation, and the gate logic determines from the instruction which ALU (Arithmetic/Logic Unit) operation to perform.
More relevant to this blog post, the microcode can specify a generic conditional test and the gate logic determines which condition to use.
Although this made the 8086’s gate logic more complicated, the tradeoff was worthwhile.

Microcode for conditional jumps

The 8086 processor has six status flags:
carry, parity, auxiliary carry, zero, sign, and overflow.1
These flags are updated by arithmetic and logic operations based on the result.
The 8086 has sixteen different conditional jump instructions2
that test status flags and jump if conditions are satisfied, such as zero, less than, or odd parity.
These instructions are very important since they permit if statements, loops, comparisons, and so forth.

In machine language, a conditional jump opcode is followed by a signed offset byte which specifies
a location relative to the current program counter, from 127 bytes ahead
to 128 bytes back.
This is a fairly small range, but the benefit is that the offset fits in a single byte, reducing the code size.3
For typical applications such as loops or conditional code, jumps usually stay in the same neighborhood of code,
so the tradeoff is worthwhile.

The 8086’s microcode was disassembled by Andrew Jenner (link) from my die photos, so we can see exactly what micro-instructions the 8086 is running for each machine instruction.
The microcode below implements conditional jumps.
In brief, the conditional jump code (Jcond) gets the branch offset byte.
It tests the appropriate condition and, if satisfied, jumps to the relative jump microcode (RELJUMP).
The RELJMP code adds the offset to the program counter.
In either case, the microcode routine ends when it runs the next instruction (RNI).

   move       action
Jcond:
1 Q→tmpBL
2          XC    RELJMP                    
3          RNI                       

RELJMP:
4          SUSP
5          CORR                      
6 PC→tmpA  ADD   tmpA
7 Σ→PC     FLUSH RNI                       

In more detail, micro-instruction 1 (arbitrary numbering) moves a byte from the prefetch queue (Q) across the queue bus to the
ALU’s temporary B register.4 (Arguments for ALU operations are first stored in temporary registers, invisible to the programmer.)
Instruction 2 tests the appropriate condition with XC, and jumps to the RELJMP routine if the condition is satisfied.5
Otherwise, RNI (Run Next Instruction) ends this sequence and loads the next machine instruction without jumping.

If the condition is satisfied, the relative jump routine starts with instruction 4, which suspends prefetching.6
Instruction 5 corrects the program counter value, since it normally points to the next byte to prefetch,
not the next byte to execute.
Instruction 6 moves the corrected program counter address to the ALU’s temporary A register.
It also starts an ALU operation to add temporary A and temporary B.
Instruction 7 moves the sum (Σ) to the program counter.
It flushes the prefetch queue, which starts up prefetching from the new PC value.
Finally, RNI runs the next instruction, from the updated address.

This code supports all 16 conditional jumps because the microcode tests the generic
“XC” condition.
This indicates that the specific test depends on the four low bits of the opcode, and the hardware determines
exactly what to test.
It’s important to keep the two levels straight: the machine instruction is doing a conditional jump to a different memory address, while the microcode that implements this instruction is performing a conditional jump to a different micro-address.

The timing for conditional jumps

The RNI (Run Next Instruction) micro-operation initiates processing of the next machine instruction.
However, it takes a clock cycle to get the next instruction from the prefetch queue, decode it, and start the
appropriate micro-instruction.
This causes a wasted clock cycle before the next micro-instruction executes.
To avoid this delay, most microcode routines issue a NXT micro-operation one cycle before they end.
This gives the 8086 time to decode the next machine instruction so micro-instructions can run uninterrupted.

Unfortunately, the conditional jump instructions can’t take advantage of NXT.
The problem is that the control flow in the microcode depends on whether the conditional jump is taken or not.
By the time the microcode knows it is not taking the branch, it’s too late to issue NXT.

The datasheet gives the timing of a conditional jump as 4 clock cycles if the jump is not taken, and 8 clock
cycles if the jump is taken.
Looking at the microcode explains these timings. There are 3 micro-instructions executed if the jump is not taken, and 7
if it is taken. Because of the RNI, there is one wasted clock cycle, resulting in the documented 4 or 8 cycles in total.

The conditions

At this point I will review the 8086’s conditional jumps.
The 8086 implements 16 conditional jumps. (This is a large number compared to earlier CPUs:
the 8080, 6502, and Z80 all had 8 conditional jumps, specified by 3 bits.)
The table below shows which flags are tested for each condition, specified by the low four bits of the opcode.
Some jump instructions have multiple names depending on the programmer’s interpretation, but they map to the same machine instruction.7

Condition Bits Condition true Condition false
Overflow Flag (OF)=1 000x overflow (JO) not overflow (JNO)
Carry Flag (CF)=1 001x carry (JC)


below (JB)


not above or equal (JNAE)
not carry (JNC)


not below (JNB)


above or equal (JAE)
Zero Flag (ZF)=1 010x zero (JZ)


equal (JE)
not zero (JNZ)


not equal (JNE)
CF=1 or ZF=1 011x below or equal (JBE)


not above (JNA)
not below or equal (JNBE)


above (JA)
Sign Flag (SF)=1 100x sign (JS) not sign (JNS)
Parity Flag (PF)=1 101x parity (JP)


parity even (JPE)
not parity (JNP)


parity odd (JPO)
SF ≠ OF 110x less (JL)


not greater or equal (JNGE)
not less (JNL)


greater or equal (JGE)
ZF=1 or SF ≠ OF 111x less or equal (JLE)


not greater (JNG)
not less or equal (JNLE)


greater (JG)

From the hardware perspective, the important thing is that there are eight different condition flag tests.
Each test has two jump instructions associated with it: one that jumps if the condition is true, and one that jumps
if the condition is false.
The low bit of the opcode selects “if true” or “if false”.

The image below shows the condition evaluation circuitry as it appears on the die. There isn’t much structure to it; it’s just
a bunch of gates.
This image shows the doped silicon regions that form transistors. The numerous small polygons with a circle inside
are connections between the metal layer and the polysilicon layer. Many of these connections use the silicon layer
to optimize the layout.

The circuitry to compute conditions as it appears on the die. The metal and polysilicon layers have been removed for this image, showing the silicon underneath.

The circuitry to compute conditions as it appears on the die. The metal and polysilicon layers have been removed for this image, showing the silicon underneath.

This circuitry evaluates each condition by getting the instruction bits from the Instruction Register,
checking the bits to match each condition, and testing if the condition is satisfied.
For instance, the overflow condition (with instruction bits 000x) is computed by a NOR gate: NOR(IR3, IR2, IR1, OF'), which will be true if instruction register bits 3, 2, and 1 are zero and the Overflow Flag is 1.

The results from the individual condition tests are combined with a 7-input NOR gate, producing a result that is 0 if the specified 3-bit condition is satisfied.
Finally, the “if true” and “if false” cases are handled by flipping this signal depending on the low bit of the instruction.
This final result indicates if the 4-bit condition in the instruction is satisfied, and this signal is passed on
to the microcode control circuitry.

One unexpected feature of the implementation is that a 7-input NOR gate combines the various conditions to
test if the selected condition is satisfied.
You’d expect that with eight conditions, there would be eight inputs to the NOR gate.
However, there is a clever optimization that takes advantage of
conditions that are combinations of clauses, for example, “less or equal”.
Specifically, the zero flag is tested for bit pattern 01xx (where x indicates a 0 or 1), which covers two conditions with one gate.
Likewise, SF≠OF is tested for bit pattern 11xx and CF=1 is tested for bit pattern 0x1x.
With these optimizations, the eight conditions are covered with seven checks.
(This shows that the opcodes weren’t assigned arbitrarily: the bit patterns needed to be carefully assigned for this to work.)

Back to the microcode

Before explaining how the microcode jump circuitry works, I’ll briefly discuss the microcode format.
A micro-instruction is encoded into 21 bits as shown below.
Every micro-instruction contains a move from a source register to a destination register, each specified with 5 bits.
The meaning of the remaining bits is a bit tricky since it depends on the type field, which is two or three bits long.
The “short jump” (type 0) is a conditional jump within the current block of 16 micro-instructions.
The ALU operation (type 1) sets up the arithmetic-logic unit to perform an operation.
Bookkeeping operations (type 4) are anything from flushing the prefetch queue to ending the current instruction.
A memory read or write is type 6.
A “long jump” (type 5) is a conditional jump to any of 16 fixed microcode locations (specified in an external table).
Finally, a “long call” (type 7) is a conditional subroutine call to one of 16 locations (different from the jump targets).

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

I’m going to focus on the XC RELJMP micro-instruction that we saw in the microcode earlier.
This is a “long jump” with XC as the condition and RELJMP as the target tag.
Another layer of hardware is required to implement the microcode conditions.
The microcode supports 16 conditions, which are completely different from the 16 programmer-level conditions.8
Some microcode conditions test special-purpose internal flags, while others test conditions such as an interrupt, the chip’s TEST pin,
bit 3 of the opcode, or if the instruction has a one-byte address offset.
The XC condition is one of these 16 conditions, number 15 specifically.

The conditions are evaluated by the condition PLA (Programmable Logic Array, a grid of gates), shown below.
The four condition bits from the micro-instruction, along with their complements, are fed into the columns.
The PLA has 16 rows, one for each condition.
Each row is a NOR gate matching one bit combination (i.e. selecting a condition) and the corresponding signal value to
test.9
Thus, if a particular condition is specified and is satisfied, that row will be 1.
The 16 row outputs are combined by the 16-input NOR gate at the left.
Thus, if the specified condition is satisfied, this output will be 0, and if the condition is unsatisfied, the
output will be 1.
This signal controls the jump or call micro-instruction:
if the condition is satisfied, the new micro-address is loaded into the microcode address register.
If the condition is not satisfied, the microcode proceeds sequentially.

The condition PLA evaluates microcode conditionals.

The condition PLA evaluates microcode conditionals.

Conclusions

To summarize, the 8086 processor implements 16 conditional jump instructions.
One piece of microcode efficiently implements all 16 instructions, with gate logic determining which flags to test, depending
on bits in the machine instruction.
The result of this test is used by the microcode XC conditional jump, one of 16 completely different microcode-level
conditions. If the XC condition is satisfied, the program counter is updated by adding the offset,
jumping to the new location.

Conditional jumps are relatively straightforward instructions from the programmer’s perspective,
but they interact with most parts of the 8086 processor
including
the prefetch queue, the address adder, the ALU, microcode, and the Translation ROM.
The diagram below shows the interactions for each step of the jump.

The conditional jump involves many parts of the die, shown in this diagram.

The conditional jump involves many parts of the die, shown in this diagram.

I’ve written multiple posts on the 8086 so far and
plan to continue reverse-engineering the 8086 die so
follow me on Twitter @kenshirriff or RSS for updates.
I’ve also started experimenting with Mastodon recently as @[email protected].

Notes and references

Read More