Architectural Overview
In this tutorial, we will build a simple SNAX system supporting one simple ALU accelerator. The figure below shows the target architecture we will build:
We annotate some notable characteristics:
(1) - It has a memory that is 128kB large, with 32 banks where each bank has 64 bits of data width. We call this memory the tighly coupled data memory (TCDM).
(2) - There is a complex TCDM interconnect that handles data transfers from Snitch CPU cores, accelerator core, the DMA and an AXI port to the entire TCDM memory. It can suppport both 64-bit and 512-bit transfers.
(3) - There exists Snitch CPU cores that control the accelerator. The Snitch is a light-weight RV32I core for dispatching commands to the accelerator.
(4) - Any accelerator sits on a shell marked by the yellow highlight. This shell provides control and data interfaces to the accelerator (SNAX ALU). The SNAX shell consists of a control and status register (CSR) manager and a data streamer.
(5) - The control and status register (CSR) manager handles CSR requests from a CPU core to the accelerator.
(6) - The data streamers provide flexible data access for the accelerators. These design and run time flexible streamers help in streamlining data delivery to the accelerator and vice versa.
(7) - There is a DMA to transfer data from the outside memory into the local TCDM. A dedicated Snitch core is given to a DMA to allow parallel operations. The programmer has full control of this DMA.
(8) - There are shared instruction caches for the CPU cores.
(9) - AXI narrow and wide interconnects for data transactions towards the outside of the SNAX cluster.
As we go through the tutorial, you will see that several of these components are design-time configurable. A user only needs to modify a configuration file to get these components in place.
Any user, with their own custom accelerator, connects their design within the broken lines of the SNAX shell. They need to comply with the control interface coming from a CSR manager and a streamer interface for accessing data from memory.
What Do You Need to Build This System?
These are major steps that you will see in going through this tutorial. We emphasize that these make it easy for someone new to integrate there accelerator into the system. To give the users a perspective on what they need to work on, we only need the following:
Building Your Accelerator Shell
The first step is to build your accelerator shell to comply with the control and data interfaces. The Accelerator Design section provides you with a discussion about our example SNAX ALU and the required interfaces. The user needs to focus on making a shell that connects the appropriate signals for the control and data ports. This is probably the most-challenging part already: getting the connections right.
Configuring the SNAX CSR Manager
A CSR manager is available to handle the control from the Snitch cores and dispatching the configuration registers to the accelerator. The accelerator needs to get the set of configured registers through a decoupled interface. More details are in CSR Manager Design
Configuring the SNAX Streamer
A streamer is available for providing reconfigurable and flexible data access from the L1 TCDM to the accelerator. The accelerator needs to comply with the decoupled data interfaces. More details are in Streamer Design.
Building Your System
Building your system is fully-automated by the scripts and makefiles in this platform. The only input that this system needs is a configuration .hjson
file. In the directory ./target/snitch_cluster/cfg/
you will find several configurations describing different systems.
The snax-alu.hjson
file contains the configurations we have for the SNAX ALU system (refer to figure above). This configuration file several system customizations. It includes the size of the memory, the connection for the accelerators to the TCDM interconnect, the configurations for the CSR manager and streamer, and so on.
For example, you would find the cluster or system configuration at the first part:
cluster: {
boot_addr: 4096, // 0x1000
cluster_base_addr: 268435456, // 0x1000_0000
cluster_base_offset: 0,
cluster_base_hartid: 0,
addr_width: 48,
data_width: 64,
tcdm: {
size: 128,
banks: 32,
},
// Other things below
Parameters like the tcdm
configurations indicate the TCDM memory size
in kB and the number of memory banks
. These settings automatically adjust the system. You can find more details in the Hardware Schema section.
You can add your accelerator configurations in a configuration file. The SNAX ALU configuration snax-alu.hjson
has core templates which configure the Snitch core and how it connects to an accelerator. The snax-alu
core template is:
// SNAX Accelerator Core Templates
snax_alu_core_template: {
isa: "rv32imafd",
xssr: true,
xfrep: true,
xdma: false,
xf16: true,
xf16alt: true,
xf8: true,
xf8alt: true,
xfdotp: true,
xfvec: true,
snax_acc_cfg: {
snax_acc_name: "snax_alu",
bender_target: ["snax_alu"],
snax_tcdm_ports: 16,
snax_num_rw_csr: 3,
snax_num_ro_csr: 2,
snax_streamer_cfg: {$ref: "#/snax_alu_csr_streamer_template" }
},
num_int_outstanding_loads: 1,
num_int_outstanding_mem: 4,
num_fp_outstanding_loads: 4,
num_fp_outstanding_mem: 4,
num_sequencer_instructions: 16,
num_dtlb_entries: 1,
num_itlb_entries: 1,
// Enable division/square root unit
// Xdiv_sqrt: true,
},
snax_acc_cfg
pertain to the Snitch core configurations. Particularly what ISA to use and which additional features it includes. You would usually keep this by default.
The snax_acc_cfg
contains the configurations for the accelerator. The configuration definitions are:
snax_acc_name
: Is the name appended to the different wrappers discussed in Building the System section.bender_target
: This is for the bender target name that you will use later in Building the System section.snax_tcdm_ports
: Is the number of tightly coupled data memory (TCDM) that your accelerator needs.snax_num_rw_csr
: Is the number of read-write (RW) registers your accelerators has. This affects the connection ports of the CSR manager. More details in SNAX CSR Manager.snax_num_ro_csr
: Is the number of read-only (RO) registers your accelerator has. This affects the connection ports CSR manager. More details in SNAX CSR Manager.snax_streamer_cfg
: Contains the settings for your streamer. More details are in SNAX Streamer
Note
At the top of the configuration file, you will also see the cluster bender target name. bender_target: ["snax_alu_cluster"],
You need to put this at the top too so that your cluster would have its own unique name and the bender targets generated will also match.
You can find more details in the Hardware Schema file.
As long as you configure your system accordingly, then the entire build process can be excuted with a single make
command. More details are in the Building the System section.
Programming Your System
After building the system, we can immediatley test and profile your work through a C-code program. We write the configuration to the CSRs using read and write commands to configure the accelerator. We provide a detailed tutorial in Programming Your Design. We also provide some useful tools for debugging and profiling your design in Other Tools.
General Directory Structure
It is nice to familiarize yourself with the directory structure of the platform. There are several files and hooks but the important directories are described in the tree below:
The project is organized as a monolithic repository. Both hardware and software are co-located.
The file tree is visualized as follows:
├── hw
│ ├── chisel
│ │ ├── csr_manager
│ │ └── streamer
│ ├── snax_accelerator_1
│ ├── snax_accelerator_2
│ ├── snitch_stuff
│ └── templates
├── sw
├── target
│ └── snitch_cluster
│ ├── config
│ ├── generated
│ └── sw
│ ├── apps
│ │ ├── snax_system_1
│ │ └── snax_system_2
│ └── snax_lib
│ ├── snax_system_1
│ └── snax_system_2
└── util
├── clustergen
└── wrappergen
The top-level is structured as follows:
docs
: Documentation of the generator and software. Contains additional user guides.hw
: All hardware IP components. The source files are either specified by SystemVerilog, Chisel, or a template to generate these files.sw
: Hardware independent software, libraries, runtimes etc.target
: Contains the testbench setup, cluster configuration specific hardware and software, libraries, runtimes etc.util
: Utility and helper scripts.
We will revisit these things later on but first let's explore and understand the first step: Building the System.