Other Tools
There are two useful tools for debugging and verifying your work. (1) Dumping waveform VCD files and (2) program stack tracing.
Dumping Waveforms with VCD Files
To dump VCD files through simulation we only need to add the --vcd
when running simulations.
bin/snitch_cluster.vlt sw/apps/snax-alu/build/snax-alu.elf --vcd
sim.vcd
. You can view this with your favorite waveform viewing tool. Taking GTKWave
as an example:
gtkwave sim.vcd &
You can download gtkwave
with this link.
Note
If you're using the codespace, you need to download the sim.vcd
file first and open it locally in your personal work space. The built-in waveform viewer in VS code has a hard time loading the system's signals.
Program Tracing with Spike
Spike is a nice tool for converting disassembly files into traces of the simulation. When your run the simulations with:
bin/snitch_cluster.vlt sw/apps/snax-alu/build/snax-alu.elf
./target/snitch_cluster/logs/.
directory. Then you should see .dasm
files:
trace_hart_00000.dasm
trace_hart_00001.dasm
These are disassembly files for each core. trace_hart_00000.dasm
is for core 0 which is the core with the SNAX ALU accelerator, while trace_hart_00001.dasm
is for core 1 which is the core with the DMA. We would like to convert these files into traces that are more readable. We will use spike
for this.
Installing Spike
1 - Navigate to ./target/snitch_cluster/work-vlt/riscv-isa-sim/.
2 - Configure the spike
installation with:
./configure --prefix=/opt/spike/
3 - Install spike
and wait for a while for this to finish. This will take quite some time.
make install -j
4 - Add to path environment:
export PATH="/opt/spike/bin:$PATH"
5 - Check if spike
is correctly installed:
spike -h
Running Traces
We can now generate traces. Make sure you are in the ./target/snitch_cluster/
directory.
1 - Run simulations first:
bin/snitch_cluster.vlt sw/apps/snax-alu/build/snax-alu.elf
2 - Make traces:
make traces
3 - This file should generated trace files:
trace_hart_00000.txt
trace_hart_00001.txt
Investigating Traces
The generated traces are a bit more readable from here. Open trace_hart_00000.txt
and search for the first instance of the mcycle
instruction (do a text search like ctrl+f
).
663000 660 M 0x8000020c csrr a4, mcycle #; mcycle = 659, (wrb) a4 <-- 659
664000 661 M 0x80000210 auipc a4, 0x2 #; (wrb) a4 <-- 0x80002210
665000 662 M 0x80000214 addi a4, a4, 1816 #; a4 = 0x80002210, (wrb) a4 <-- 0x80002928
666000 663 M 0x80000218 lw a6, 4(a4) #; a4 = 0x80002928, a6 <~~ Word[0x8000292c]
677000 674 M #; (lsu) a6 <-- 0
678000 675 M 0x8000021c lw a7, 0(a4) #; a4 = 0x80002928, a7 <~~ Word[0x80002928]
679000 676 M 0x80000220 li t0, 0 #; (wrb) t0 <-- 0
689000 686 M #; (lsu) a7 <-- 80
690000 687 M 0x80000224 csrw unknown_3c0, a7 #; a7 = 80
691000 688 M 0x80000228 add a3, a5, a3 #; a5 = 0x10000a00, a3 = 2560, (wrb) a3 <-- 0x10001400
692000 689 M 0x8000022c li a6, 32 #; (wrb) a6 <-- 32
693000 690 M 0x80000230 csrw unknown_3c1, a6 #; a6 = 32
694000 691 M 0x80000234 csrw unknown_3c2, a6 #; a6 = 32
695000 692 M 0x80000238 li a6, 64 #; (wrb) a6 <-- 64
696000 693 M 0x8000023c csrw unknown_3c3, a6 #; a6 = 64
733000 730 M 0x80000240 csrwi unknown_3c4, 8 #;
734000 731 M 0x80000244 csrwi unknown_3c5, 8 #;
735000 732 M 0x80000248 csrwi unknown_3c6, 8 #;
736000 733 M 0x8000024c csrw unknown_3c7, a0 #; a0 = 0x10000000
737000 734 M 0x80000250 csrw unknown_3c8, a5 #; a5 = 0x10000a00
738000 735 M 0x80000254 csrw unknown_3c9, a3 #; a3 = 0x10001400
739000 736 M 0x80000258 csrwi unknown_3ca, 1 #;
740000 737 M 0x8000025c auipc a3, 0x4 #; (wrb) a3 <-- 0x8000425c
757000 754 M 0x80000260 addi a3, a3, 1412 #; a3 = 0x8000425c, (wrb) a3 <-- 0x800047e0
758000 755 M 0x80000264 lw a5, 4(a3) #; a3 = 0x800047e0, a5 <~~ Word[0x800047e4]
769000 766 M #; (lsu) a5 <-- 0
770000 767 M 0x80000268 lw a3, 0(a3) #; a3 = 0x800047e0, a3 <~~ Word[0x800047e0]
781000 778 M #; (lsu) a3 <-- 0
782000 779 M 0x8000026c csrw unknown_3cc, a3 #; a3 = 0
783000 780 M 0x80000270 lw a3, 4(a4) #; a4 = 0x80002928, a3 <~~ Word[0x8000292c]
794000 791 M #; (lsu) a3 <-- 0
795000 792 M 0x80000274 lw a4, 0(a4) #; a4 = 0x80002928, a4 <~~ Word[0x80002928]
806000 803 M #; (lsu) a4 <-- 80
807000 804 M 0x80000278 csrw unknown_3cd, a4 #; a4 = 80
808000 805 M 0x8000027c csrwi unknown_3ce, 1 #;
809000 806 M 0x80000280 csrr a3, mcycle #; mcycle = 805, (wrb) a3 <-- 805
trace_hart_00000.txt
is the trace file for core 0 which is also controlling the SNAX ALU accelerator. The columns are arranged as follows:
- 1st column is the time in ns.
- 2nd column is the clock cycle count.
- 3rd column is the instruction address.
- 4th column is the instruction.
- 5th column is the arguments for the instruction.
- 6th column is the comments section to indicate what has happened.
You can see comments that indicate load operations of the load-store unit of the Snitch core. For example:
#; (lsu) a4 <-- 80
This shows that the value 80 was loaded into a4 at this specific cycle.
Recall, from our snax-alu.c
program, there is a compute core assignment where we tag the mcycle
count for the CSR setup cycles. The first instance of the mcycle
for the computer happens before the CSR setup. The second time the mcycle
appears when the CSR setup is finished. You could visibly locate this in the trace.
Moreover, it is interesting to see the consistency of the CSR write instructions. For example, clock cycles 690 and 691 pertain to the part of the snax-alu.c
program.
write_csr(0x3c1, 32);
write_csr(0x3c2, 32);
Some Exercise Questions
At what clock cycle was the loop bound register set?
Clock cycle 687How long does it take to set the CSR cycles from the start of the streamer up to the start of the accelerator?
Starts at clock cycle `mcycle=660` and ends at clock cycle `mcycle=806`. A total of 146 clock cycles. This is different from the performance counter we measured: 102 clock cycles.Where can I find the `mcycle` tags for the DMA core?
We need to check `trace_hart_00001.txt`.How many cycles does it take to preload the data with the DMA? (Assuming the default settings for `snax-alu.c`)
Starts at `mcycle=620` and ends at `mcycle=657`. A total of 37 clock cycles.Questasim Simulation
We also support simulations using QuestaSim. This tool is more powerful than Verilator, because you can also trace signals that may have been unconnected. Moreoever, it offers a more useful sanity-checking of the connections or designs you made. If you opt to use QuestaSim, the steps below will guide you through.
Note
There are options where you can mount the QuestaSim to the container, or you have installed, the necessary packages to make the Makefiles work without a container. The steps below assume you don't do either of this, but if ever you do, it's okay to just run the commands directly.
Steps 1, 2, 3, and 4 need to run inside the container first. Make sure to navigate to /target/snitch_cluster/
1 - Generate all necessary RTL files
make CFG_OVERRIDE=cfg/snax-alu.hjson rtl-gen
2 - Build the bootdata.cc
. Note that to change the path accordingly.
make /repo/target/snitch_cluster/generated/bootdata.cc
3 - Build the libfesvr.a
make work/lib/libfesvr.a
4 - While still in the container, you can build the software too:
make sw CFG_OVERRIDE=cfg/snax-alu.hjson SELECT_RUNTIME=rtl-generic SELECT_TOOLCHAIN=llvm-generic
5 - Exit the container. (This step is not needed if you can mount QuestaSim to the container or you can run everything without a container.)
6 - Build the hardware with QuestaSim
make CFG_OVERRIDE=snax-alu.hjson bin/snitch_cluster.vsim
Done! This builds the system for QuestaSim. To simulate some built binaries (.elf
files) you can just do:
make bin/snitch_cluster.vsim sw/apps/snax-alu/build/snax-alu.elf