ZigZag - Deep Learning Hardware Design Space Exploration
This repository presents the novel version of our tried-and-tested hardware Architecture-Mapping Design Space Exploration (DSE) Framework for Deep Learning (DL) accelerators. ZigZag bridges the gap between algorithmic DL decisions and their acceleration cost on specialized accelerators through a fast and accurate hardware cost estimation.
|
definition of general initialization function for D/AIMC More...
Public Member Functions | |
def | __init__ (self, bool is_analog_imc, int bit_serial_precision, list[int] input_precision, int adc_resolution, int cells_size, float|None cells_area, dict[OADimension, int] dimension_sizes, bool auto_cost_extraction=False) |
float | get_1b_adder_energy (self) |
energy of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate More... | |
float | get_1b_adder_energy_half_activated (self) |
energy of 1b full adder when 1 input is 0 More... | |
float | get_1b_multiplier_energy (self) |
energy of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate why 0.5: considering weight stays constant during multiplication More... | |
float | get_1b_reg_energy (self) |
energy of 1b DFF More... | |
float | get_1b_adder_area (self) |
area of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate More... | |
float | get_1b_multiplier_area (self) |
area of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate More... | |
float | get_1b_reg_area (self) |
area of 1b DFF More... | |
float | get_1b_adder_dly_in2sum (self) |
delay of 1b adder: input to sum-out More... | |
float | get_1b_adder_dly_in2cout (self) |
delay of 1b adder: input to carry-out More... | |
float | get_1b_adder_dly_cin2cout (self) |
delay of 1b adder: carry-in to carry-out More... | |
float | get_1b_multiplier_dly (self) |
delay of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate More... | |
tuple[float, float, float, float] | get_mapped_oa_dim (self, LayerNode layer, OADimension wl_dim, OADimension bl_dim) |
get the mapped oa_dim in current mapping. More... | |
tuple[float, float] | get_precharge_energy (self, dict[str, float] tech_param, LayerNode layer, Mapping mapping) |
float | get_regular_adder_trees_energy (self, int adder_input_precision, float active_inputs_number, float physical_inputs_number) |
get the energy spent on regular RCA adder trees without place values More... | |
Static Public Member Functions | |
tuple[float, float, float, float] | get_single_cell_array_cost_from_cacti (float tech_node, float wordline_dim_size, float bitline_dim_size, float cells_size, int weight_precision) |
get the area, energy cost of a single macro (cell array) using CACTI this function is called when cacti is required for cost estimation More... | |
tuple[float, float] | calculate_mapped_rows_when_diagonal_mapping_found (LayerNode layer, LayerOperand layer_const_operand, LayerOperand layer_act_operand, MappingSingleOADim spatial_mapping_on_wordline_dim, MappingSingleOADim spatial_mapping_on_bitline_dim) |
This function is used for calculating the total mapped number of rows when OX, OY unroll is found, which requires a diagonal data mapping. More... | |
Static Public Attributes | |
dictionary | TECH_PARAM_28NM |
definition of general initialization function for D/AIMC
def __init__ | ( | self, | |
bool | is_analog_imc, | ||
int | bit_serial_precision, | ||
list[int] | input_precision, | ||
int | adc_resolution, | ||
int | cells_size, | ||
float | None | cells_area, | ||
dict[OADimension, int] | dimension_sizes, | ||
bool | auto_cost_extraction = False |
||
) |
|
static |
This function is used for calculating the total mapped number of rows when OX, OY unroll is found, which requires a diagonal data mapping.
If OX, OY unroll does not exist, you can also use this function to calculate the total mapped number of rows. The only drawback is the simulation time is longer.
float get_1b_adder_area | ( | self | ) |
area of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate
float get_1b_adder_dly_cin2cout | ( | self | ) |
delay of 1b adder: carry-in to carry-out
float get_1b_adder_dly_in2cout | ( | self | ) |
delay of 1b adder: input to carry-out
float get_1b_adder_dly_in2sum | ( | self | ) |
delay of 1b adder: input to sum-out
float get_1b_adder_energy | ( | self | ) |
energy of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate
float get_1b_adder_energy_half_activated | ( | self | ) |
energy of 1b full adder when 1 input is 0
float get_1b_multiplier_area | ( | self | ) |
area of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate
float get_1b_multiplier_dly | ( | self | ) |
delay of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate
float get_1b_multiplier_energy | ( | self | ) |
energy of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate why 0.5: considering weight stays constant during multiplication
float get_1b_reg_area | ( | self | ) |
area of 1b DFF
float get_1b_reg_energy | ( | self | ) |
energy of 1b DFF
tuple[float, float, float, float] get_mapped_oa_dim | ( | self, | |
LayerNode | layer, | ||
OADimension | wl_dim, | ||
OADimension | bl_dim | ||
) |
get the mapped oa_dim in current mapping.
The energy of unmapped oa_dim will be set to 0.
tuple[float, float] get_precharge_energy | ( | self, | |
dict[str, float] | tech_param, | ||
LayerNode | layer, | ||
Mapping | mapping | ||
) |
float get_regular_adder_trees_energy | ( | self, | |
int | adder_input_precision, | ||
float | active_inputs_number, | ||
float | physical_inputs_number | ||
) |
get the energy spent on regular RCA adder trees without place values
|
static |
get the area, energy cost of a single macro (cell array) using CACTI this function is called when cacti is required for cost estimation
tech_node | the technology node (e.g. 0.028, 0.032, 0.022 ... unit: um) |
wordline_dim_size | the size of dimension where wordline is. |
bitline_dim_size | the size of dimension where bitline (adder tree) is. |
cells_size | the size of each cell group (unit: bit) |
weight_precision | weight precision (number of SRAM cells required to store a operand) |
activation_precision |
adc_resolution |
area |
area_breakdown |
auto_cost_extraction |
bit_serial_precision |
bitline_dim_size |
bl_dim |
cells_area |
cells_size |
cells_w_cost |
delay |
delay_breakdown |
energy |
energy_breakdown |
is_aimc |
mapped_group_depth |
mapped_rows_total_per_macro |
nb_of_banks |
tech_param |
|
static |
total_unit_count |
weight_precision |
wl_dim |
wordline_dim_size |