Workload¶
A workload is the neural-network computation you want to map. Stream takes workloads as ONNX models. The ONNX parser walks the graph and turns recognised operators into the internal computation nodes that the rest of the pipeline schedules and allocates.
The parser lives in stream/parser/onnx/ (stream/parser/onnx/model.py is the dispatch table). What follows reflects exactly what that code parses today.
ONNX in, computation graph out¶
Stream loads an ONNX model, runs shape inference on it, and converts each node:
- A supported operator becomes a
ComputationNode- it has a real cost and is placed on a core. - A fusion-boundary operator (
Flatten,Reshape) becomes a shape-onlyDummyNode- no compute, it only marks where one fusion group ends and the next begins. - An unrecognised operator raises
NotImplementedError. Stream does not silently drop unknown ops; you either add a parser for it or remove it from the model.
Supported operators¶
The current dispatch table (ONNXModelParser.OP_TYPE_TO_PARSER) recognises:
| ONNX op | Becomes | Notes |
|---|---|---|
Conv | ComputationNode | Convolution. |
Gemm | ComputationNode | General matrix multiply (also used for matrix-vector). |
MaxPool | ComputationNode | Max pooling. |
GlobalAveragePool | ComputationNode | Global average pooling. |
Add | ComputationNode | Element-wise add (e.g. residual). |
Mul | ComputationNode | Element-wise multiply. |
Relu | ComputationNode | ReLU activation. |
Silu | ComputationNode | SiLU activation (routed through the SIMD parser). |
BatchNormalization | ComputationNode | Batch normalisation. |
Flatten, Reshape | DummyNode | Shape-only fusion boundary. |
Other op types are intentionally unregistered (several are present but commented out in model.py). To support a new operator, add a parser under stream/parser/onnx/ and register it in the dispatch table.
Shape inference is required¶
Stream needs the shape of every intermediate tensor to derive each layer's loop dimensions. The parser calls onnx.shape_inference.infer_shapes for you, but the model must carry enough type/shape information for inference to succeed. If you build a model by hand, infer shapes before saving:
import onnx
from onnx import shape_inference
model = onnx.load("my_model.onnx")
onnx.save(shape_inference.infer_shapes(model), "my_model_inferred.onnx")
Weights are not needed - clear them¶
Stream only uses tensor shapes and dtypes for cost modelling; it never reads weight values. Keep your committed ONNX small by clearing the initializer data. Note that the data may live in any of several fields depending on dtype (bf16 weights, for instance, pack into int32_data, not float_data), so clear them all:
for field in ("float_data", "double_data", "int32_data",
"int64_data", "uint64_data", "raw_data"):
tensor.ClearField(field)
This is exactly what the bundled workload builders do - the committed example ONNX are only a few hundred bytes because their weights are cleared.
For very large models you can alternatively keep weights in an external file (onnx.save_model(..., save_as_external_data=True)) and load with load_external_data=False; Stream works fine without the external data present.
Building a workload programmatically¶
The repo ships small workloads as ready-to-use ONNX fixtures under stream/inputs/testing/workload/, generated by Python builders in the same directory. just gen-workloads regenerates them.
make_2_conv.py - two chained Conv layers. The committed fixture 2conv_1_8_32_32_16_32_3.onnx is [1,8,32,32] → Conv(16) → Conv(32) → [1,32,32,32].
make_swiglu.py - a 5-node SwiGLU block: two parallel Gemms, a Silu activation, an element-wise Mul, and a down-projection Gemm. The committed fixture is swiglu_1_16_32.onnx.
A minimal builder looks like this:
import numpy as np
import onnx
from onnx import TensorProto, helper, shape_inference
inp = helper.make_tensor_value_info("input", TensorProto.BFLOAT16, [1, 8, 32, 32])
out = helper.make_tensor_value_info("output", TensorProto.BFLOAT16, [1, 16, 32, 32])
w = helper.make_tensor("weights", TensorProto.BFLOAT16, [16, 8, 3, 3],
np.zeros((16, 8, 3, 3)))
for field in ("float_data", "double_data", "int32_data",
"int64_data", "uint64_data", "raw_data"):
w.ClearField(field) # keep shape + dtype, drop values
conv = helper.make_node("Conv", ["input", "weights"], ["output"],
name="Conv1", kernel_shape=[3, 3], pads=[1, 1, 1, 1])
graph = helper.make_graph([conv], "OneConv", [inp], [out], initializer=[w])
model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 17)])
onnx.save(shape_inference.infer_shapes(model), "one_conv.onnx")
Once saved, run it through the pipeline like any other workload:
python scripts/main_stream_co.py \
--hardware stream/inputs/examples/hardware/tpu_like_quad_core.yaml \
--workload one_conv.onnx
See Getting Started for the full run flow and Mapping for how the operators in your workload get matched to cores.