ZigZag - Deep Learning Hardware Design Space Exploration
This repository presents the novel version of our tried-and-tested hardware Architecture-Mapping Design Space Exploration (DSE) Framework for Deep Learning (DL) accelerators. ZigZag bridges the gap between algorithmic DL decisions and their acceleration cost on specialized accelerators through a fast and accurate hardware cost estimation.
|
Collect information of a complete mapping (spatial and temporal) More...
Public Member Functions | |
def | __init__ (self, Accelerator accelerator, SpatialMappingPerMemLvl|SpatialMappingInternal spatial_mapping, TemporalMapping temporal_mapping, LayerNode layer_node, bool access_same_data_considered_as_no_access=False) |
def | combine_spatial_temporal_mapping_dict (self) |
Combine spatial and temporal mapping dictionary into combined_mapping_dict by inserting spatial loops above temporal loops at each level. More... | |
list[bool] | get_psum_flags (self) |
This function generates an list "psum_flag" that identify whether an output memory level holds partial or final output. More... | |
def | gen_data_precision_dict (self) |
This function generates a dictionary that collect data precision for each operand at each arch level. More... | |
def | gen_r_ir_loop_list (self) |
Given the combined mapping, generate r/ir loop size list at each level for each operand. More... | |
def | calc_data_size (self) |
Based on the r loop size list, calculate the data size held by each architectural level. More... | |
def | calc_effective_data_size (self) |
Calculate the effective data size for getting the allowed memory updating window in latency calculation. More... | |
def | calc_data_access (self) |
Based on the ir loop size list and the total MAC Op count, calculate the data access at each memory level in a bottom-up way. More... | |
def | calc_req_mem_bw_and_data_transfer_rate (self) |
This function calculates the average & instant required memory bw and the periodic data transfer pattern. More... | |
def | disable_data_traffic_external (self) |
This function set all the data traffic between the top level memory and the external world to 0 in unit_mem_data_movement. More... | |
Collect information of a complete mapping (spatial and temporal)
NOTE: Mapping is HW-unaware, i.e. Mapping doesn't take in HW information like memory bw, access cost, size and so on.
def __init__ | ( | self, | |
Accelerator | accelerator, | ||
SpatialMappingPerMemLvl | SpatialMappingInternal | spatial_mapping, | ||
TemporalMapping | temporal_mapping, | ||
LayerNode | layer_node, | ||
bool | access_same_data_considered_as_no_access = False |
||
) |
def calc_data_access | ( | self | ) |
Based on the ir loop size list and the total MAC Op count, calculate the data access at each memory level in a bottom-up way.
def calc_data_size | ( | self | ) |
Based on the r loop size list, calculate the data size held by each architectural level.
def calc_effective_data_size | ( | self | ) |
Calculate the effective data size for getting the allowed memory updating window in latency calculation.
The effective data size is calculated by using data_elem_per_level_unrolled divided by the top r loops.
def calc_req_mem_bw_and_data_transfer_rate | ( | self | ) |
This function calculates the average & instant required memory bw and the periodic data transfer pattern.
def combine_spatial_temporal_mapping_dict | ( | self | ) |
Combine spatial and temporal mapping dictionary into combined_mapping_dict by inserting spatial loops above temporal loops at each level.
def disable_data_traffic_external | ( | self | ) |
This function set all the data traffic between the top level memory and the external world to 0 in unit_mem_data_movement.
def gen_data_precision_dict | ( | self | ) |
This function generates a dictionary that collect data precision for each operand at each arch level.
def gen_r_ir_loop_list | ( | self | ) |
Given the combined mapping, generate r/ir loop size list at each level for each operand.
list[bool] get_psum_flags | ( | self | ) |
This function generates an list "psum_flag" that identify whether an output memory level holds partial or final output.
E.g., psum_flag = [True, True, False] means that there are 3 memory levels for output and only the outermost memory level hold the final output, the 1st and 2nd memory levels need to store partial output for some time. For indexing convenience, we add an extra False to the end of the psum_flag list.
accelerator |
access_same_data_considered_as_no_access |
combined_mapping_dict_1s1t |
combined_mapping_dict_1s1t_reform |
combined_mapping_dict_1s2t |
combined_mapping_dict_1s2t_reform |
data_access_raw |
data_access_raw2 |
data_bit_per_level |
data_bit_per_level_unrolled |
data_elem_per_level |
data_elem_per_level_unrolled |
data_precision_dict |
effective_data_bit |
effective_data_elem |
ir_loop_size_cabl |
ir_loop_size_cabl2 |
ir_loop_size_per_level |
layer_node |
mem_level |
operand_list |
output_ir_loop_size_caal |
psum_flag |
r_loop_size_cabl |
r_loop_size_cabl2 |
r_loop_size_per_level |
r_loop_size_per_level2 |
spatial_mapping |
temporal_mapping |