Workload

The recommended way of defining an algorithmic workload is through an onnx model. An onnx model can contain multiple operator types, which in the context of ML are often referred to as different layers, some of which are automatically recognized and parsed by ZigZag. Alternatively, the layers can be manually defined for more customization.

Onnx models

Supported onnx operators

A complete list of all onnx operators can be found here.

The following operators are supported by ZigZag and will automatically be parsed into LayerNode objects when using your onnx model within the framework:

All other operators will be parsed into a DummyNode object, which is assumed to not be accelerateable, incurring zero hardware costs. If you have an onnx operator you would like to see supported, feel free to open an issue or manually add it yourself in the ONNXModelParserStage taking into account the Contributing guidelines.

Saving your onnx model with external data

If your onnx model is rather large, and you want to avoid having it inside of your ZigZag repo, you can save the original model’s weights as an external file, which can be discarded (ZigZag doesn’t require the weight values but only the layers’ shape). You can do so as follows:

import onnx
# onnx_model is an in-memory ModelProto
model = onnx.load('my_model_with_internal_data.onnx')
onnx.save_model(model, 'path/to/save/the/model.onnx', save_as_external_data=True, all_tensors_to_one_file=True, location='external_data_filename', size_threshold=1024, convert_attribute=False)
# Then the onnx_model has converted raw data as external data and saved to a specific directory

Inferring an onnx model’s shapes

ZigZag requires an inferred onnx model, as it needs to know the shapes of all intermediate tensors to correctly infer the layer shapes. If you have an onnx model that is not shape inferred, you can do so by the following commands:

import onnx
from onnx import shape_inference
model = onnx.load("my_model.onnx")
inferred_model = shape_inference.infer_shapes(model)
onnx.save(inferred_model, "my_inferred_model.onnx")

Manual layer definition

It is also possible to manually define your own workload layers. In that case, the main.py file should be executed instead of main_onnx.py. Moreover, the workload file should be provided as input together with the accelerator, and there is no onnx model loaded.

Each layer definition is represented as a Python dictionary, which should have the following attributes:

  • equation: The operational equation for this layer. The dimensions should be small letters, whereas the operands are large letters. ‘O’ should always be used for the output operand; the input operands can be named freely.

  • dimension_relations: The relationship between different dimensions present in the equation. This is often used in convolutional layers, where there is a relationship between the spatial input indices and the spatial output indices through the stride and with the filter indices through the dilation rate.

  • loop_dim_size: The size of the different dimensions present in the equation. Dimensions defined (i.e. on the left-hand side) in the dimension_relations are not to be provided and are inferred automatically.

  • operand_precision: The bit precision of the different operands present in the equation. ‘O’ should always be used, which represents the partial output precision. ‘O_final’ represents the final output precision.

  • operand_source: The layer id the input operands of this layer come from. This is important to correctly build the NN graph edges.

  • constant_operands: The operands of this layer which are constants and do not depend on prior computations, like the weight.

  • core_allocation: The core that will execute this layer.

  • spatial_mapping: The spatial parallelization strategy used for this layer. If none is provided, the SpatialMappingGeneratorStage should be used within ZigZag’s execution pipeline.

  • memory_operand_links: The link between the virtual memory operands and the actual algorithmic operands. For more information, read the hardware readme.

The following loop notation has to be used to describe a layer of the workload (see loop notation in this paper):

  • B: Batch size

  • K: Output channels

  • C: Input channels

  • OY: Output rows

  • OX: Output columns

  • FY: Kernel rows

  • FX: Kernel columns

An example of this manual layer definition can be found at: inputs/examples/workloads/resnet18.py.