ZigZag - Deep Learning Hardware Design Space Exploration
This repository presents the novel version of our tried-and-tested hardware Architecture-Mapping Design Space Exploration (DSE) Framework for Deep Learning (DL) accelerators. ZigZag bridges the gap between algorithmic DL decisions and their acceleration cost on specialized accelerators through a fast and accurate hardware cost estimation.
|
Class that stores inputs and runs them through the zigzag cost model. More...
Public Member Functions | |
def | __init__ (self, *Accelerator accelerator, LayerNode layer, SpatialMappingInternal spatial_mapping, SpatialMappingInternal spatial_mapping_int, TemporalMapping temporal_mapping, bool access_same_data_considered_as_no_access=True, float cycles_per_op=1.0) |
After initialization, the cost model evaluation is run. More... | |
None | run (self) |
Run the cost model evaluation. More... | |
None | calc_memory_utilization (self) |
None | calc_memory_word_access (self) |
None | calc_energy (self) |
Calculates the energy cost of this cost model evaluation by calculating the memory reading/writing energy. More... | |
None | calc_mac_energy_cost (self) |
Calculate the dynamic MAC energy. More... | |
def | calc_memory_energy_cost (self) |
Computes the memories reading/writing energy by converting the access patterns in self.mapping to energy breakdown using the memory hierarchy of the core on which the layer is mapped. More... | |
None | calc_latency (self) |
Calculate latency in 4 steps. More... | |
None | calc_double_buffer_flag (self) |
This function checks the double-buffer possibility for each operand at each memory level (minimal memory BW requirement case) by comparing the physical memory size with the effective data size, taking into account the memory sharing between operands. More... | |
def | calc_allowed_and_real_data_transfer_cycle_per_data_transfer_link (self) |
Construct a 4-way data transfer pattern for each unit mem, calculate {allowed_mem_updating_cycle, real_data_trans_cycle, DTL_SS_cycle} per period. More... | |
MemoryAccesses | calculate_real_data_transfer_cycles (self, LayerOperand layer_op, MemoryOperand mem_op, int mem_lvl_id) |
MemoryAccesses | calculate_allowed_transfer_cycles (self, LayerOperand layer_op, int mem_lv) |
None | combine_data_transfer_rate_per_physical_port (self) |
Consider memory sharing and port sharing, combine the data transfer activity Step 1: collect port activity per memory instance per physical memory port Step 2: calculate SS combine and MUW union parameters per physical memory port. More... | |
list[PortBeginOrEndActivity] | calc_loading_single_port_period_count_1 (self, MemoryPort port, list[tuple[MemoryOperand, int, DataDirection]] mem_op_level_direction_combs, float total_req_bw_aver_computation) |
Calculate the loading activities for the memory port movement directions with period count of 1. More... | |
list[PortBeginOrEndActivity] | calc_loading_single_port_period_count_greater_than_1 (self, MemoryPort port, list[tuple[MemoryOperand, int, DataDirection]] mem_op_level_direction_combs) |
Calculate the loading and offloading activities with mem port movement directions period > 1. More... | |
def | calc_loading_single_port (self, MemoryPort port) |
Calculate the loading and offloading activities for a single memory port. More... | |
def | calc_onloading_combined (self) |
def | calc_offloading_combined (self) |
tuple[float, float] | calc_borrowed_loading_cycles_and_bandwidth (self) |
Calculate the amount of cycles and bandwidth that are borrowed from the computation phase. More... | |
def | calc_data_loading_latency (self) |
Calculate the number of data onloading/offloading cycles. More... | |
None | calc_overall_latency (self) |
This function integrates the previous calculated SScomb, data loading and off-loading cycle to get the overall latency. More... | |
FourWayDataMoving[int] | get_inst_bandwidth (self, MemoryLevel memory_level, MemoryOperand memory_operand, float scaling=1) |
Given a memory level and a memory operand, compute the memory's instantaneous bandwidth required for the memory operand during computation. More... | |
FourWayDataMoving[int] | get_total_inst_bandwidth (self, MemoryLevel memory_level, float scaling=1) |
Given a cost model evaluation and a memory level, compute the memory's total instantaneous bandwidth required throughout the execution of the layer that corresponds to this CME. More... | |
def | __str__ (self) |
def | __repr__ (self) |
![]() | |
None | __init__ (self) |
def | core (self) |
"CumulativeCME" | __add__ (self, "CostModelEvaluationABC" other) |
def | __mul__ (self, int number) |
dict[str, float] | __simplejsonrepr__ (self) |
Simple JSON representation used for saving this object to a simple json file. More... | |
def | __jsonrepr__ (self) |
JSON representation used for saving this object to a json file. More... | |
Static Public Member Functions | |
tuple[list[float], list[float]] | reduce_balanced (list[float] c_list, list[float] m_list, float s) |
Balance c_list towards minimums m_list with a total maximum reduction of s. More... | |
Class that stores inputs and runs them through the zigzag cost model.
Initialize the cost model evaluation with the following inputs:
From these parameters, the following attributes are computed:
The following cost model attributes are also initialized:
After initialization, the cost model evaluation is run.
def __init__ | ( | self, | |
*Accelerator | accelerator, | ||
LayerNode | layer, | ||
SpatialMappingInternal | spatial_mapping, | ||
SpatialMappingInternal | spatial_mapping_int, | ||
TemporalMapping | temporal_mapping, | ||
bool | access_same_data_considered_as_no_access = True , |
||
float | cycles_per_op = 1.0 |
||
) |
After initialization, the cost model evaluation is run.
accelerator | the accelerator that includes the core on which to run the |
layer | the layer to run |
access_same_data_considered_as_no_access | (optional) |
def __repr__ | ( | self | ) |
def __str__ | ( | self | ) |
def calc_allowed_and_real_data_transfer_cycle_per_data_transfer_link | ( | self | ) |
Construct a 4-way data transfer pattern for each unit mem, calculate {allowed_mem_updating_cycle, real_data_trans_cycle, DTL_SS_cycle} per period.
tuple[float, float] calc_borrowed_loading_cycles_and_bandwidth | ( | self | ) |
Calculate the amount of cycles and bandwidth that are borrowed from the computation phase.
def calc_data_loading_latency | ( | self | ) |
Calculate the number of data onloading/offloading cycles.
This function first gathers the correct cycles per port, then combines them.
None calc_double_buffer_flag | ( | self | ) |
This function checks the double-buffer possibility for each operand at each memory level (minimal memory BW requirement case) by comparing the physical memory size with the effective data size, taking into account the memory sharing between operands.
None calc_energy | ( | self | ) |
Calculates the energy cost of this cost model evaluation by calculating the memory reading/writing energy.
None calc_latency | ( | self | ) |
Calculate latency in 4 steps.
1) As we already calculated the ideal data transfer rate in combined_mapping.py (in the Mapping class), here we start with calculating the required (or allowed) memory updating window by comparing the effective data size with the physical memory size at each level. If the effective data size is smaller than 50% of the physical memory size, then we take the whole period as the allowed memory updating window (double buffer effect); otherwise we take the the period divided by the top_ir_loop as the allowed memory updating window. 2) Then, we compute the real data transfer rate given the actual memory bw per functional port pair, assuming we have enough memory ports. 3) In reality, there is no infinite memory port to use. So, as the second step, we combine the real data transfer attributes per physical memory port. 4) Finally, we combine the stall/slack of each memory port to get the final latency.
Reimplemented in CostModelEvaluationForIMC.
def calc_loading_single_port | ( | self, | |
MemoryPort | port | ||
) |
Calculate the loading and offloading activities for a single memory port.
list[PortBeginOrEndActivity] calc_loading_single_port_period_count_1 | ( | self, | |
MemoryPort | port, | ||
list[tuple[MemoryOperand, int, DataDirection]] | mem_op_level_direction_combs, | ||
float | total_req_bw_aver_computation | ||
) |
Calculate the loading activities for the memory port movement directions with period count of 1.
list[PortBeginOrEndActivity] calc_loading_single_port_period_count_greater_than_1 | ( | self, | |
MemoryPort | port, | ||
list[tuple[MemoryOperand, int, DataDirection]] | mem_op_level_direction_combs | ||
) |
Calculate the loading and offloading activities with mem port movement directions period > 1.
None calc_mac_energy_cost | ( | self | ) |
Calculate the dynamic MAC energy.
Reimplemented in CostModelEvaluationForIMC.
def calc_memory_energy_cost | ( | self | ) |
Computes the memories reading/writing energy by converting the access patterns in self.mapping to energy breakdown using the memory hierarchy of the core on which the layer is mapped.
The energy breakdown is saved in self.mem_energy_breakdown.
The energy total consumption is saved in self.energy_total.
None calc_memory_utilization | ( | self | ) |
None calc_memory_word_access | ( | self | ) |
def calc_offloading_combined | ( | self | ) |
def calc_onloading_combined | ( | self | ) |
None calc_overall_latency | ( | self | ) |
This function integrates the previous calculated SScomb, data loading and off-loading cycle to get the overall latency.
cycles_per_op | cycle counts per mac operand (>1 for bit-serial computation) |
MemoryAccesses calculate_allowed_transfer_cycles | ( | self, | |
LayerOperand | layer_op, | ||
int | mem_lv | ||
) |
MemoryAccesses calculate_real_data_transfer_cycles | ( | self, | |
LayerOperand | layer_op, | ||
MemoryOperand | mem_op, | ||
int | mem_lvl_id | ||
) |
None combine_data_transfer_rate_per_physical_port | ( | self | ) |
Consider memory sharing and port sharing, combine the data transfer activity Step 1: collect port activity per memory instance per physical memory port Step 2: calculate SS combine and MUW union parameters per physical memory port.
FourWayDataMoving[int] get_inst_bandwidth | ( | self, | |
MemoryLevel | memory_level, | ||
MemoryOperand | memory_operand, | ||
float | scaling = 1 |
||
) |
Given a memory level and a memory operand, compute the memory's instantaneous bandwidth required for the memory operand during computation.
An additonal scaling factor can be applied to the returned bandwidth NOTE: This function is used in Stream
FourWayDataMoving[int] get_total_inst_bandwidth | ( | self, | |
MemoryLevel | memory_level, | ||
float | scaling = 1 |
||
) |
Given a cost model evaluation and a memory level, compute the memory's total instantaneous bandwidth required throughout the execution of the layer that corresponds to this CME.
Raises an assertion error if the given memory level is not included in this CME's memory hierarchy. NOTE: this function is used in Stream
|
static |
Balance c_list towards minimums m_list with a total maximum reduction of s.
None run | ( | self | ) |
Run the cost model evaluation.
Reimplemented in CostModelEvaluationForIMC.
accelerator |
access_same_data_considered_as_no_access |
active_mem_level |
allowed_mem_update_cycle |
cycles_per_op |
data_loading_cc_pair_combined_per_op |
data_loading_half_shared_part |
data_loading_individual_part |
data_loading_shared_part |
data_offloading_cc_pair_combined |
data_offloading_cycle |
data_onloading_cycle |
double_buffer_true |
effective_mem_utili_individual |
effective_mem_utili_shared |
ideal_cycle |
ideal_temporal_cycle |
latency_total0 |
latency_total1 |
latency_total2 |
loading_offloading_bandwidth_borrowed_from_computation |
loading_offloading_cycles_borrowed_from_computation |
mac_energy |
mac_spatial_utilization |
mac_utilization0 |
mac_utilization1 |
mac_utilization2 |
mapping |
mapping_int |
mem_energy |
mem_energy_breakdown |
mem_energy_breakdown_further |
mem_hierarchy_dict |
mem_level_list |
mem_sharing_tuple |
mem_size_dict |
mem_updating_window_union_collect |
mem_utili_individual |
mem_utili_shared |
memory_operand_links |
memory_word_access |
port_activity_collect |
real_data_trans_cycle |
spatial_mapping |
spatial_mapping_dict_int |
spatial_mapping_int |
stall_slack_comb |
stall_slack_comb_collect |
temporal_mapping |