ZigZag - Deep Learning Hardware Design Space Exploration
This repository presents the novel version of our tried-and-tested hardware Architecture-Mapping Design Space Exploration (DSE) Framework for Deep Learning (DL) accelerators. ZigZag bridges the gap between algorithmic DL decisions and their acceleration cost on specialized accelerators through a fast and accurate hardware cost estimation.
ImcUnit Class Reference

definition of general initialization function for D/AIMC More...

Inheritance diagram for ImcUnit:
Collaboration diagram for ImcUnit:

Public Member Functions

def __init__ (self, bool is_analog_imc, int bit_serial_precision, list[int] input_precision, int adc_resolution, int cells_size, float|None cells_area, dict[OADimension, int] dimension_sizes, bool auto_cost_extraction=False)
 
float get_1b_adder_energy (self)
 energy of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate More...
 
float get_1b_adder_energy_half_activated (self)
 energy of 1b full adder when 1 input is 0 More...
 
float get_1b_multiplier_energy (self)
 energy of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate why 0.5: considering weight stays constant during multiplication More...
 
float get_1b_reg_energy (self)
 energy of 1b DFF More...
 
float get_1b_adder_area (self)
 area of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate More...
 
float get_1b_multiplier_area (self)
 area of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate More...
 
float get_1b_reg_area (self)
 area of 1b DFF More...
 
float get_1b_adder_dly_in2sum (self)
 delay of 1b adder: input to sum-out More...
 
float get_1b_adder_dly_in2cout (self)
 delay of 1b adder: input to carry-out More...
 
float get_1b_adder_dly_cin2cout (self)
 delay of 1b adder: carry-in to carry-out More...
 
float get_1b_multiplier_dly (self)
 delay of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate More...
 
tuple[float, float, float, float] get_mapped_oa_dim (self, LayerNode layer, OADimension wl_dim, OADimension bl_dim)
 get the mapped oa_dim in current mapping. More...
 
tuple[float, float] get_precharge_energy (self, dict[str, float] tech_param, LayerNode layer, Mapping mapping)
 
float get_regular_adder_trees_energy (self, int adder_input_precision, float active_inputs_number, float physical_inputs_number)
 get the energy spent on regular RCA adder trees without place values More...
 

Static Public Member Functions

tuple[float, float, float, float] get_single_cell_array_cost_from_cacti (float tech_node, float wordline_dim_size, float bitline_dim_size, float cells_size, int weight_precision)
 get the area, energy cost of a single macro (cell array) using CACTI this function is called when cacti is required for cost estimation More...
 
tuple[float, float] calculate_mapped_rows_when_diagonal_mapping_found (LayerNode layer, LayerOperand layer_const_operand, LayerOperand layer_act_operand, MappingSingleOADim spatial_mapping_on_wordline_dim, MappingSingleOADim spatial_mapping_on_bitline_dim)
 This function is used for calculating the total mapped number of rows when OX, OY unroll is found, which requires a diagonal data mapping. More...
 

Public Attributes

 tech_param
 
 is_aimc
 
 bit_serial_precision
 
 adc_resolution
 
 cells_size
 
 cells_area
 
 auto_cost_extraction
 
 activation_precision
 
 weight_precision
 
 total_unit_count
 
 wl_dim
 
 wordline_dim_size
 
 bl_dim
 
 bitline_dim_size
 
 nb_of_banks
 
 energy
 
 energy_breakdown
 
 area
 
 area_breakdown
 
 delay
 
 delay_breakdown
 
 mapped_rows_total_per_macro
 
 mapped_group_depth
 
 cells_w_cost
 

Static Public Attributes

dictionary TECH_PARAM_28NM
 

Detailed Description

definition of general initialization function for D/AIMC

Constructor & Destructor Documentation

◆ __init__()

def __init__ (   self,
bool  is_analog_imc,
int  bit_serial_precision,
list[int]  input_precision,
int  adc_resolution,
int  cells_size,
float | None  cells_area,
dict[OADimension, int]  dimension_sizes,
bool   auto_cost_extraction = False 
)

Member Function Documentation

◆ calculate_mapped_rows_when_diagonal_mapping_found()

tuple[float, float] calculate_mapped_rows_when_diagonal_mapping_found ( LayerNode  layer,
LayerOperand  layer_const_operand,
LayerOperand  layer_act_operand,
MappingSingleOADim  spatial_mapping_on_wordline_dim,
MappingSingleOADim  spatial_mapping_on_bitline_dim 
)
static

This function is used for calculating the total mapped number of rows when OX, OY unroll is found, which requires a diagonal data mapping.

If OX, OY unroll does not exist, you can also use this function to calculate the total mapped number of rows. The only drawback is the simulation time is longer.

◆ get_1b_adder_area()

float get_1b_adder_area (   self)

area of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate

◆ get_1b_adder_dly_cin2cout()

float get_1b_adder_dly_cin2cout (   self)

delay of 1b adder: carry-in to carry-out

Here is the caller graph for this function:

◆ get_1b_adder_dly_in2cout()

float get_1b_adder_dly_in2cout (   self)

delay of 1b adder: input to carry-out

Here is the caller graph for this function:

◆ get_1b_adder_dly_in2sum()

float get_1b_adder_dly_in2sum (   self)

delay of 1b adder: input to sum-out

Here is the caller graph for this function:

◆ get_1b_adder_energy()

float get_1b_adder_energy (   self)

energy of 1b full adder Assume a 1b adder has 3 ND2 gate and 2 XOR2 gate

Here is the caller graph for this function:

◆ get_1b_adder_energy_half_activated()

float get_1b_adder_energy_half_activated (   self)

energy of 1b full adder when 1 input is 0

Here is the caller graph for this function:

◆ get_1b_multiplier_area()

float get_1b_multiplier_area (   self)

area of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate

◆ get_1b_multiplier_dly()

float get_1b_multiplier_dly (   self)

delay of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate

Here is the caller graph for this function:

◆ get_1b_multiplier_energy()

float get_1b_multiplier_energy (   self)

energy of 1b multiplier 1b mult includes 1 NOR gate, which is assumed as the same cost of ND2 gate why 0.5: considering weight stays constant during multiplication

Here is the caller graph for this function:

◆ get_1b_reg_area()

float get_1b_reg_area (   self)

area of 1b DFF

◆ get_1b_reg_energy()

float get_1b_reg_energy (   self)

energy of 1b DFF

Here is the caller graph for this function:

◆ get_mapped_oa_dim()

tuple[float, float, float, float] get_mapped_oa_dim (   self,
LayerNode  layer,
OADimension  wl_dim,
OADimension   bl_dim 
)

get the mapped oa_dim in current mapping.

The energy of unmapped oa_dim will be set to 0.

Here is the caller graph for this function:

◆ get_precharge_energy()

tuple[float, float] get_precharge_energy (   self,
dict[str, float]  tech_param,
LayerNode  layer,
Mapping   mapping 
)

◆ get_regular_adder_trees_energy()

float get_regular_adder_trees_energy (   self,
int  adder_input_precision,
float  active_inputs_number,
float  physical_inputs_number 
)

get the energy spent on regular RCA adder trees without place values

Here is the call graph for this function:

◆ get_single_cell_array_cost_from_cacti()

tuple[float, float, float, float] get_single_cell_array_cost_from_cacti ( float  tech_node,
float  wordline_dim_size,
float  bitline_dim_size,
float  cells_size,
int  weight_precision 
)
static

get the area, energy cost of a single macro (cell array) using CACTI this function is called when cacti is required for cost estimation

Parameters
tech_nodethe technology node (e.g. 0.028, 0.032, 0.022 ... unit: um)
wordline_dim_sizethe size of dimension where wordline is.
bitline_dim_sizethe size of dimension where bitline (adder tree) is.
cells_sizethe size of each cell group (unit: bit)
weight_precisionweight precision (number of SRAM cells required to store a operand)
Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ activation_precision

activation_precision

◆ adc_resolution

adc_resolution

◆ area

area

◆ area_breakdown

area_breakdown

◆ auto_cost_extraction

auto_cost_extraction

◆ bit_serial_precision

bit_serial_precision

◆ bitline_dim_size

bitline_dim_size

◆ bl_dim

bl_dim

◆ cells_area

cells_area

◆ cells_size

cells_size

◆ cells_w_cost

cells_w_cost

◆ delay

delay

◆ delay_breakdown

delay_breakdown

◆ energy

energy

◆ energy_breakdown

energy_breakdown

◆ is_aimc

is_aimc

◆ mapped_group_depth

mapped_group_depth

◆ mapped_rows_total_per_macro

mapped_rows_total_per_macro

◆ nb_of_banks

nb_of_banks

◆ tech_param

tech_param

◆ TECH_PARAM_28NM

dictionary TECH_PARAM_28NM
static
Initial value:
= {
"tech_node": 0.028, # unit: um
"vdd": 0.9, # unit: V
"nd2_cap": 0.7 / 1e3, # unit: pF
"wl_cap": 0.7 / 2 / 1e3, # unit: pF (wordline cap of each SRAM cell is treated as NAND2_cap/2)
"bl_cap": 0.7 / 2 / 1e3, # unit: pF (bitline cap of each SRAM cell is treated as NAND2_cap/2)
"xor2_cap": 0.7 * 1.5 / 1e3, # unit: pF
"dff_cap": 0.7 * 3 / 1e3, # unit: pF
"nd2_area": 0.614 / 1e6, # unit: mm^2
"xor2_area": 0.614 * 2.4 / 1e6, # unit: mm^2
"dff_area": 0.614 * 6 / 1e6, # unit: mm^2
"nd2_dly": 0.0478, # unit: ns
"xor2_dly": 0.0478 * 2.4, # unit: ns
}

◆ total_unit_count

total_unit_count

◆ weight_precision

weight_precision

◆ wl_dim

wl_dim

◆ wordline_dim_size

wordline_dim_size

The documentation for this class was generated from the following file: