RISC-V Core: M0 Validation - No Pipeline

Alex Johnson
-
RISC-V Core: M0 Validation - No Pipeline

The M0 module marks the first functional milestone of the project: the complete validation of the RISC-V processor's core before pipeline stages are incorporated. This discussion focuses on the architecture and validation of the foundational elements.

The objective of M0 is to ensure that the fundamental elements of the processor function correctly in combination, within a single cycle: Fetch → Decode → Execute → Write-back.


1. General Objective of M0

The primary goal of M0 is to construct and validate the "heart" of the processor. At the core of M0 lies a meticulously designed architecture focused on ensuring each component operates flawlessly in a single cycle. This approach allows for thorough testing and debugging before more complex features are introduced.

  • A PC that advances instructions aligned to 4 bytes ensures proper sequencing and memory access.
  • A 32-bit instruction memory, loaded from an external file, provides the program instructions. The instruction memory, loaded from an external file, feeds the processor with the instructions to be executed. This allows for flexibility in testing and running various programs on the core. Ensuring the memory correctly loads and provides instructions is crucial for the processor's operation.
  • A RISC-V decoder capable of recognizing R and I type instructions interprets the instructions and prepares them for execution. This involves identifying the opcode, registers, and immediate values, which are then used to control the ALU and other components.
  • A regfile composed of 32 registers, each 32 bits wide, with double read and synchronous write capability, serves as the processor's primary storage. The regfile is designed to allow simultaneous reading of two registers, which is crucial for many instructions. The synchronous write ensures that data is written correctly and consistently. The careful design of the register file contributes significantly to the processor's ability to handle data efficiently.
  • A 32-bit ALU featuring the necessary arithmetic and logical operations for the subsequent pipeline. The ALU is the workhorse of the processor, performing the necessary calculations and logical operations as dictated by the instructions. Its 32-bit design ensures that it can handle a wide range of data types and operations. The flexibility and robustness of the ALU are critical to the overall functionality of the processor.
  • A complete execution cycle without pipelining to properly validate the data flow.

This milestone ensures that the most critical modules of the processor are correct before introducing the IF/ID/EX/MEM/WB stages. This process is paramount in verifying the foundational elements before scaling the complexity.


2. Designed and Verified Components

2.1 Program Counter (pc.v)

The program counter (PC) is the cornerstone of instruction sequencing in the M0 design. It maintains the address of the current instruction being executed. The simplicity of the PC design is intentional, serving as a robust starting point for more complex pipelined designs. Ensuring the PC increments correctly and resets properly is vital for program execution. It is a 32-bit register that increments by 4 each cycle (PC = PC + 4), aligned with the instruction size. This increment ensures that the PC always points to the next instruction in memory. A synchronous reset to zero ensures that the processor can start from a known state. This is crucial for proper initialization and debugging. The PC is also designed to accommodate branch and jump instructions in later stages. This foresight ensures the PC is adaptable to more complex control flow scenarios.

  • 32-bit register.
  • Increments by 4 each cycle (PC = PC + 4).
  • Synchronous reset to zero.
  • Allows receiving PC_branch and PC_jump in later stages.

2.2 Instruction Memory (instr_mem.v)

The instruction memory is where the program's instructions are stored, and it plays a crucial role in the processor's ability to fetch and execute code. It's designed as a read-only memory (ROM) that is loaded from an external file. Ensuring the instruction memory is correctly loaded and addressed is fundamental to the correct operation of the processor. This ROM is 32 bits wide and is loaded using the $readmemh("program_m0.hex") function. This function reads the hexadecimal instructions from the specified file and loads them into the memory. The address for accessing the memory comes from pc[31:2], ensuring that the memory is addressed correctly. The instruction memory was tested within the simulation environment. The instructions were successfully fetched and decoded, verifying the correct functionality of the instruction memory.

In the simulation, three test instructions were loaded to validate the instruction memory's functionality:

ADDI x1, x0, 5
ADDI x2, x0, 10
ADD  x3, x1, x2

2.3 General Purpose Register (regfile32.v)

The general-purpose register file (regfile32.v) is a critical component in the M0 design, serving as the primary storage for data that the processor operates on. The design features 32 registers, each 32 bits wide, allowing for efficient storage and retrieval of data. This structure supports double read and synchronous write operations, allowing for simultaneous access to two registers for reading, which is essential for many instructions. The synchronous write ensures that data is written correctly and consistently. The register file is implemented with a fixed zero value for register x0, adhering to the RISC-V standard. This is important for various operations and simplifies certain instructions.

  • 32 registers × 32 bits.
  • Double combinational read.
  • Synchronous write.
  • Register x0 is fixed to zero (standard RISC-V behavior).
  • Validated through isolated testbench and M0 testbench.

2.4 32-bit ALU (alu32.v)

The 32-bit Arithmetic Logic Unit (ALU) is the computational heart of the M0 processor. It's responsible for performing all arithmetic and logical operations required by the instructions. The ALU's design focuses on supporting the operations needed for both R-type and I-type instructions in the RISC-V instruction set. The careful design of the ALU ensures that it can handle a wide range of operations with precision and efficiency. Its compatibility with the decoder signals ensures seamless integration with the instruction processing pipeline.

  • Implemented operations:

    • ADD, SUB, AND, OR, XOR
    • SLL, SRL, SRA
    • ADDI, ANDI, ORI, XORI
  • Compatible with decoder signals.

  • Support for R-type and I-type operations.


2.5 RISC-V Decoder (rv_decoder.v)

The RISC-V decoder (rv_decoder.v) plays a crucial role in interpreting instructions fetched from memory and preparing them for execution. It identifies the type of instruction and extracts the necessary fields, such as register operands, immediate values, and the operation to be performed. The decoder's accuracy is critical for ensuring that the correct operations are executed. The decoder generates the alu_op signal, which controls the ALU's operation. This signal is carefully designed to be compatible with the ALU, ensuring seamless integration and correct execution of instructions. This decoder successfully identifies the following instructions:

  • Identifies instructions:

    • R-type: ADD, SUB, AND, OR, XOR, SLL, SRL, SRA
    • I-type: ADDI, ANDI, ORI, XORI
  • Extracts correctly:

    • rs1, rs2, rd
    • imm_i (12 bits)
    • funct3, funct7
    • opcode
  • Generates alu_op signal compatible with alu32.


3. Validated Execution Flow in M0

The M0 executes a complete instruction in a single cycle. In this design the single-cycle execution is key to validating the core components and ensuring that data flows correctly through the system. The focus is on ensuring each step is performed correctly before pipelining is introduced.

  1. Fetch: The PC obtains an instruction from instr_mem. The PC provides the address of the next instruction, which is then fetched from the instruction memory.
  2. Decode: The decoder extracts rs1, rs2, rd, immediate, and ALU operation. The decoder analyzes the instruction to determine the registers to be used, the immediate value (if any), and the operation to be performed by the ALU. This is the critical step that translates the instruction into actions.
  3. Read: The regfile delivers the corresponding operands. The register file provides the values stored in the registers specified by the instruction. These values are then passed to the ALU for computation.
  4. Execute: The ALU performs the arithmetic/logic operations. The ALU performs the operation specified by the instruction, using the operands provided by the register file. The result is then written back to the register file.
  5. Write-back: The result is written back into the regfile (unless rd = x0). The result of the ALU operation is written back to the register specified in the instruction. The exception is when the destination register is x0, which is always zero, and thus no write occurs.

This flow was verified through timing simulation and expected values.


4. Simulation Results

The simulation results provide a detailed look at the functional output of the M0 core, and showcase the execution of several instructions. The output confirms that the instructions are fetched, decoded, and executed in the expected order. Monitoring the values in the registers allows for the verification of the correct operation of the ALU and the register file.

The functional output obtained is as follows:

=== CORE M0 Simulation Start ===
PC = 0x00000000   instr = 0x00500093   ; ADDI x1, x0, 5
PC = 0x00000004   instr = 0x00A00113   ; ADDI x2, x0, 10
PC = 0x00000008   instr = 0x002081B3   ; ADD  x3, x1, x2
...

Observed results in the registers:

  • x1 = 5
  • x2 = 10
  • x3 = 15

The regfile correctly received the writes:

  • In cycle 1: write x1 = 0x0000_0005
  • In cycle 2: write x2 = 0x0000_000A
  • In cycle 3: write x3 = 0x0000_000F

The ALU generated the appropriate values, and the PC advanced correctly.


5. Waveform Analysis

Image

The .hex file contains:

00500093    # ADDI x1, x0, 5
00A00113    # ADDI x2, x0, 10
002081B3    # ADD  x3, x1, x2

The image shows the order of these three instructions.

The following aspects are validated in the waveform:

  • The PC increments correctly by 4 bytes per cycle.
  • The decoder correctly delivers rs1, rs2, rd, imm_i, is_rtype, and is_itype.
  • The regfile returns the expected operands (rdata1, rdata2).
  • The ALU produces valid results (alu_result).
  • The write enable (we) signal is activated only when writing is necessary.
  • The regfile effectively updates registers x1, x2, and x3 with the expected values.

This confirms that all modules interact correctly.


6. Conclusion of M0

The M0 represents a fully functional core without pipelining, where all the base elements of the processor have already been designed, integrated, and verified. This milestone provides a solid foundation for future development. The validation of each component and the overall data flow ensures that the core is reliable and ready for more complex features.

With this milestone completed, we have:

  • A stable datapath
  • A reliable decoder
  • A validated ALU
  • Registers functioning correctly
  • Operational instruction memory
  • Validated complete instruction flow

This prepares us to advance to M1, where we will begin implementing the IF/ID/EX/MEM/WB pipeline stages, hazard handling, and forwarding. This is the next step toward a fully functional pipelined RISC-V processor. For further reading on RISC-V architecture, visit the RISC-V Foundation Website.

You may also like