Skip to content

CI run 5c62e31

  • Run: link
  • Time: Denver 2025-09-10 05:21:08 MDT • Brussels 2025-09-10 13:21:08 CEST

GEMM Deployment Summary

  • Workflow: AIE Deployment Gemm
  • Commit: 5c62e31af261e2ab427062a424caef874ef5cec5
  • Runner: venus
  • Run time: Denver 2025-09-10 05:20:51 MDT • Brussels 2025-09-10 13:20:51 CEST
  • Run: #88; Attempt 1
HW M K N Rows Cols Status Note
single_col 128 128 128 2 1 🐬 success
single_col 128 128 256 2 1 🐬 success
single_col 128 128 64 2 1 🐬 success
single_col 128 256 128 2 1 🐬 success
single_col 128 256 256 2 1 🐬 success
single_col 128 256 64 2 1 🐬 success
single_col 128 64 128 2 1 🐬 success
single_col 128 64 256 2 1 🐬 success
single_col 128 64 64 2 1 🐬 success
single_col 256 128 128 2 1 🐬 success
single_col 256 128 256 2 1 🐬 success
single_col 256 128 64 2 1 🐬 success
single_col 256 256 128 2 1 🐬 success
single_col 256 256 256 2 1 🐬 success
single_col 256 256 64 2 1 🐬 success
single_col 256 64 128 2 1 🐬 success
single_col 256 64 256 2 1 🐬 success
single_col 256 64 64 2 1 🐬 success
single_col 64 128 128 2 1 🐬 success
single_col 64 128 256 2 1 🐬 success
single_col 64 128 64 2 1 🐬 success
single_col 64 256 128 2 1 🐬 success
single_col 64 256 256 2 1 🐬 success
single_col 64 256 64 2 1 🐬 success
single_col 64 64 128 2 1 🐬 success
single_col 64 64 256 2 1 🐬 success
single_col 64 64 64 2 1 🐬 success
single_core 128 128 128 1 1 🐬 success
single_core 128 128 256 1 1 🐬 success
single_core 128 128 64 1 1 🐬 success
single_core 128 256 128 1 1 🐬 success
single_core 128 256 256 1 1 🐬 success
single_core 128 256 64 1 1 🐬 success
single_core 128 64 128 1 1 🐬 success
single_core 128 64 256 1 1 🐬 success
single_core 128 64 64 1 1 🐬 success
single_core 256 128 128 1 1 🐬 success
single_core 256 128 256 1 1 🐬 success
single_core 256 128 64 1 1 🐬 success
single_core 256 256 128 1 1 🐬 success
single_core 256 256 256 1 1 🐬 success
single_core 256 256 64 1 1 🐬 success
single_core 256 64 128 1 1 🐬 success
single_core 256 64 256 1 1 🐬 success
single_core 256 64 64 1 1 🐬 success
single_core 64 128 128 1 1 🐬 success
single_core 64 128 256 1 1 🐬 success
single_core 64 128 64 1 1 🐬 success
single_core 64 256 128 1 1 🐬 success
single_core 64 256 256 1 1 🐬 success
single_core 64 256 64 1 1 🐬 success
single_core 64 64 128 1 1 🐬 success
single_core 64 64 256 1 1 🐬 success
single_core 64 64 64 1 1 🐬 success
whole_array 128 128 256 2 4 🐬 success
whole_array 128 256 256 2 4 🐬 success
whole_array 128 64 256 2 4 🐬 success
whole_array 256 128 256 2 4 🐬 success
whole_array 256 256 256 2 4 🐬 success
whole_array 256 64 256 2 4 🐬 success
whole_array 64 128 256 2 4 🐬 success
whole_array 64 256 256 2 4 🐬 success
whole_array 64 64 256 2 4 🐬 success

Totals: 🐬 63 • ❌ 0 • All: 63

[single_col] M=128 K=128 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1046.219 | 1022 | 1156 | 47004 | 27.885 | | tile2,1 | 1 | perf | 31 | 1047.258 | 528 | 1134 | 45974 | 28.510 | | tile2,1 | 2 | perf | 32 | 1052.531 | 1022 | 1134 | 47192 | 27.774 | | tile2,1 | 3 | perf | 32 | 1061.438 | 1022 | 1155 | 47488 | 27.601 | | tile2,1 | 4 | perf | 31 | 1078.484 | 1022 | 1412 | 46950 | 27.917 | | tile2,1 | 5 | perf | 30 | 1047.500 | 1022 | 1154 | 44917 | 29.181 | | tile3,1 | 0 | warmup | 32 | 1045.750 | 1022 | 1156 | 47004 | 27.885 | | tile3,1 | 1 | perf | 31 | 1051.097 | 648 | 1134 | 46092 | 28.437 | | tile3,1 | 2 | perf | 32 | 1052.469 | 1022 | 1134 | 47192 | 27.774 | | tile3,1 | 3 | perf | 32 | 1061.219 | 1022 | 1155 | 47488 | 27.601 | | tile3,1 | 4 | perf | 31 | 1078.581 | 1022 | 1420 | 46956 | 27.914 | | tile3,1 | 5 | perf | 30 | 1047.033 | 1022 | 1154 | 44916 | 29.182 |
[single_col] M=128 K=128 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 64 | 1057.781 | 1022 | 1164 | 97150 | 26.983 | | tile3,1 | 1 | perf | 62 | 1076.419 | 1022 | 2217 | 94360 | 27.781 | | tile3,1 | 2 | perf | 61 | 1057.000 | 934 | 1318 | 93850 | 27.932 | | tile3,1 | 3 | perf | 64 | 1061.203 | 1022 | 1134 | 97350 | 26.928 | | tile3,1 | 4 | perf | 64 | 1065.312 | 1022 | 1158 | 97644 | 26.847 | | tile3,1 | 5 | perf | 64 | 1057.500 | 1022 | 1164 | 97137 | 26.987 | | tile2,1 | 0 | warmup | 64 | 1058.094 | 1022 | 1181 | 97150 | 26.983 | | tile2,1 | 1 | perf | 62 | 1072.226 | 1022 | 1931 | 94074 | 27.866 | | tile2,1 | 2 | perf | 61 | 1052.525 | 803 | 1154 | 93549 | 28.022 | | tile2,1 | 3 | perf | 64 | 1061.125 | 1022 | 1134 | 97349 | 26.928 | | tile2,1 | 4 | perf | 64 | 1065.422 | 1022 | 1159 | 97644 | 26.847 | | tile2,1 | 5 | perf | 64 | 1057.891 | 1022 | 1176 | 97138 | 26.987 |
[single_col] M=128 K=128 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1038.500 | 1022 | 1086 | 22203 | 29.517 | | tile2,1 | 1 | perf | 13 | 1151.308 | 1022 | 2326 | 18715 | 35.018 | | tile2,1 | 2 | perf | 16 | 1068.312 | 1022 | 1134 | 22673 | 28.905 | | tile2,1 | 3 | perf | 16 | 1066.625 | 1022 | 1134 | 22639 | 28.948 | | tile2,1 | 4 | perf | 13 | 1136.769 | 1022 | 1979 | 20314 | 32.261 | | tile2,1 | 5 | perf | 16 | 1038.500 | 1022 | 1086 | 22202 | 29.518 | | tile3,1 | 0 | warmup | 16 | 1038.500 | 1022 | 1086 | 22204 | 29.515 | | tile3,1 | 1 | perf | 13 | 1149.538 | 1022 | 2303 | 18692 | 35.061 | | tile3,1 | 2 | perf | 16 | 1068.500 | 1022 | 1134 | 22671 | 28.907 | | tile3,1 | 3 | perf | 16 | 1066.625 | 1022 | 1134 | 22639 | 28.948 | | tile3,1 | 4 | perf | 13 | 1063.308 | 1022 | 1134 | 20473 | 32.011 | | tile3,1 | 5 | perf | 16 | 1038.500 | 1022 | 1086 | 22202 | 29.518 |
[single_col] M=128 K=256 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 64 | 1107.531 | 1024 | 1186 | 81568 | 32.138 | | tile3,1 | 1 | perf | 62 | 1104.694 | 1024 | 1189 | 78857 | 33.243 | | tile3,1 | 2 | perf | 62 | 1108.661 | 1022 | 1201 | 79327 | 33.046 | | tile3,1 | 3 | perf | 61 | 1133.443 | 1022 | 1379 | 79454 | 32.993 | | tile3,1 | 4 | perf | 62 | 1114.290 | 998 | 1169 | 79657 | 32.909 | | tile3,1 | 5 | perf | 61 | 1106.230 | 942 | 1189 | 78022 | 33.599 | | tile2,1 | 0 | warmup | 64 | 1108.109 | 1024 | 1194 | 81090 | 32.328 | | tile2,1 | 1 | perf | 62 | 1104.403 | 1024 | 1199 | 78329 | 33.467 | | tile2,1 | 2 | perf | 62 | 1109.177 | 1022 | 1197 | 78845 | 33.248 | | tile2,1 | 3 | perf | 61 | 1133.164 | 1022 | 1356 | 78924 | 33.215 | | tile2,1 | 4 | perf | 62 | 1113.468 | 931 | 1172 | 79091 | 33.145 | | tile2,1 | 5 | perf | 61 | 1107.344 | 954 | 1200 | 77572 | 33.794 |
[single_col] M=128 K=256 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 128 | 1107.344 | 1024 | 1187 | 168631 | 31.091 | | tile3,1 | 1 | perf | 128 | 1118.469 | 1022 | 1191 | 170038 | 30.834 | | tile3,1 | 2 | perf | 126 | 1108.603 | 504 | 1192 | 166425 | 31.503 | | tile3,1 | 3 | perf | 128 | 1113.500 | 1022 | 1186 | 169415 | 30.947 | | tile3,1 | 4 | perf | 128 | 1117.648 | 1023 | 1183 | 169856 | 30.867 | | tile3,1 | 5 | perf | 90 | 1108.611 | 1024 | 1185 | 118302 | 44.318 | | tile2,1 | 0 | warmup | 128 | 1108.320 | 1024 | 1196 | 168144 | 31.181 | | tile2,1 | 1 | perf | 128 | 1119.250 | 1022 | 1197 | 169545 | 30.923 | | tile2,1 | 2 | perf | 126 | 1109.397 | 500 | 1202 | 165960 | 31.591 | | tile2,1 | 3 | perf | 128 | 1114.281 | 1022 | 1200 | 168921 | 31.037 | | tile2,1 | 4 | perf | 128 | 1118.211 | 1024 | 1195 | 169399 | 30.950 | | tile2,1 | 5 | perf | 91 | 1110.659 | 1024 | 1200 | 119058 | 44.036 |
[single_col] M=128 K=256 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 32 | 1109.406 | 1024 | 1169 | 37613 | 34.848 | | tile3,1 | 1 | perf | 29 | 1125.517 | 1025 | 1765 | 34388 | 38.116 | | tile3,1 | 2 | perf | 29 | 1112.345 | 1024 | 1247 | 34232 | 38.289 | | tile3,1 | 3 | perf | 29 | 1121.517 | 1025 | 1360 | 34501 | 37.991 | | tile3,1 | 4 | perf | 28 | 1107.643 | 1024 | 1185 | 33366 | 39.283 | | tile3,1 | 5 | perf | 29 | 1138.586 | 1022 | 1637 | 34767 | 37.700 | | tile2,1 | 0 | warmup | 32 | 1109.438 | 1025 | 1169 | 37614 | 34.847 | | tile2,1 | 1 | perf | 29 | 1123.172 | 1024 | 1698 | 34322 | 38.189 | | tile2,1 | 2 | perf | 29 | 1111.690 | 1024 | 1234 | 34212 | 38.312 | | tile2,1 | 3 | perf | 29 | 1119.310 | 1025 | 1296 | 34437 | 38.061 | | tile2,1 | 4 | perf | 28 | 1107.679 | 1024 | 1185 | 33347 | 39.305 | | tile2,1 | 5 | perf | 29 | 1139.655 | 1022 | 1667 | 34798 | 37.667 |
[single_col] M=128 K=64 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1053.375 | 1022 | 1117 | 19201 | 34.132 | | tile2,1 | 1 | perf | 16 | 1053.062 | 1022 | 1086 | 19202 | 34.130 | | tile2,1 | 2 | perf | 13 | 1064.000 | 1022 | 1170 | 16193 | 40.472 | | tile2,1 | 3 | perf | 16 | 1052.500 | 1022 | 1087 | 19186 | 34.158 | | tile2,1 | 4 | perf | 14 | 1011.571 | 481 | 1168 | 16239 | 40.357 | | tile2,1 | 5 | perf | 16 | 1053.312 | 1022 | 1086 | 19193 | 34.146 | | tile3,1 | 0 | warmup | 16 | 1053.375 | 1022 | 1117 | 19201 | 34.132 | | tile3,1 | 1 | perf | 16 | 1053.062 | 1022 | 1086 | 19202 | 34.130 | | tile3,1 | 2 | perf | 13 | 1058.154 | 1022 | 1159 | 16703 | 39.236 | | tile3,1 | 3 | perf | 16 | 1052.500 | 1022 | 1087 | 19186 | 34.158 | | tile3,1 | 4 | perf | 13 | 1117.077 | 1022 | 1896 | 16376 | 40.020 | | tile3,1 | 5 | perf | 16 | 1053.312 | 1022 | 1086 | 19193 | 34.146 |
[single_col] M=128 K=64 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 32 | 1055.250 | 1022 | 1122 | 39240 | 33.403 | | tile3,1 | 1 | perf | 29 | 1047.897 | 743 | 1158 | 36319 | 36.089 | | tile3,1 | 2 | perf | 29 | 1054.000 | 1018 | 1176 | 36323 | 36.085 | | tile3,1 | 3 | perf | 32 | 1054.781 | 1022 | 1132 | 39219 | 33.421 | | tile3,1 | 4 | perf | 32 | 1056.750 | 1022 | 1171 | 39287 | 33.363 | | tile3,1 | 5 | perf | 32 | 1054.938 | 1022 | 1086 | 39235 | 33.407 | | tile2,1 | 0 | warmup | 32 | 1055.188 | 1022 | 1122 | 39237 | 33.405 | | tile2,1 | 1 | perf | 29 | 1047.897 | 764 | 1136 | 35550 | 36.870 | | tile2,1 | 2 | perf | 29 | 1054.690 | 1022 | 1177 | 35764 | 36.649 | | tile2,1 | 3 | perf | 32 | 1054.688 | 1022 | 1108 | 39214 | 33.425 | | tile2,1 | 4 | perf | 32 | 1056.875 | 1022 | 1171 | 39290 | 33.360 | | tile2,1 | 5 | perf | 32 | 1054.906 | 1022 | 1100 | 39234 | 33.408 |
[single_col] M=128 K=64 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 8 | 1052.625 | 1022 | 1086 | 9206 | 35.594 | | tile2,1 | 1 | perf | 8 | 1051.125 | 1022 | 1086 | 9196 | 35.633 | | tile2,1 | 2 | perf | 8 | 1055.250 | 1022 | 1170 | 9229 | 35.505 | | tile2,1 | 3 | perf | 8 | 1050.625 | 1022 | 1086 | 9192 | 35.648 | | tile2,1 | 4 | perf | 8 | 1054.250 | 1022 | 1159 | 9220 | 35.540 | | tile2,1 | 5 | perf | 8 | 1031.375 | 914 | 1086 | 9036 | 36.264 | | tile3,1 | 0 | warmup | 8 | 1052.625 | 1022 | 1086 | 9206 | 35.594 | | tile3,1 | 1 | perf | 8 | 1051.125 | 1022 | 1086 | 9197 | 35.629 | | tile3,1 | 2 | perf | 8 | 1055.250 | 1022 | 1170 | 9228 | 35.509 | | tile3,1 | 3 | perf | 8 | 1051.000 | 1022 | 1086 | 9193 | 35.645 | | tile3,1 | 4 | perf | 8 | 1054.250 | 1022 | 1159 | 9221 | 35.536 | | tile3,1 | 5 | perf | 8 | 1034.125 | 936 | 1086 | 9058 | 36.176 |
[single_col] M=256 K=128 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 64 | 1046.250 | 1022 | 1159 | 95230 | 27.527 | | tile3,1 | 1 | perf | 61 | 1097.279 | 1022 | 2788 | 92801 | 28.248 | | tile3,1 | 2 | perf | 63 | 1045.873 | 624 | 1134 | 94125 | 27.851 | | tile3,1 | 3 | perf | 64 | 1061.750 | 1022 | 1155 | 96231 | 27.241 | | tile3,1 | 4 | perf | 64 | 1066.531 | 1022 | 1134 | 96530 | 27.157 | | tile3,1 | 5 | perf | 64 | 1046.031 | 1022 | 1155 | 95220 | 27.530 | | tile2,1 | 0 | warmup | 64 | 1046.516 | 1022 | 1159 | 95231 | 27.527 | | tile2,1 | 1 | perf | 61 | 1088.492 | 1022 | 2250 | 92766 | 28.259 | | tile2,1 | 2 | perf | 63 | 1045.333 | 584 | 1134 | 94085 | 27.862 | | tile2,1 | 3 | perf | 64 | 1061.875 | 1022 | 1155 | 96231 | 27.241 | | tile2,1 | 4 | perf | 64 | 1066.500 | 1022 | 1134 | 96529 | 27.157 | | tile2,1 | 5 | perf | 64 | 1046.297 | 1022 | 1155 | 95220 | 27.530 |
[single_col] M=256 K=128 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 128 | 1057.875 | 1022 | 1181 | 194321 | 26.981 | | tile2,1 | 1 | perf | 126 | 1057.452 | 999 | 1267 | 193271 | 27.127 | | tile2,1 | 2 | perf | 125 | 1063.248 | 1022 | 1846 | 191723 | 27.346 | | tile2,1 | 3 | perf | 122 | 1066.205 | 1022 | 1578 | 187098 | 28.022 | | tile2,1 | 4 | perf | 127 | 1064.087 | 1022 | 1156 | 194035 | 27.020 | | tile2,1 | 5 | perf | 124 | 1071.202 | 1022 | 2721 | 189891 | 27.610 | | tile3,1 | 0 | warmup | 128 | 1057.641 | 1022 | 1162 | 194321 | 26.981 | | tile3,1 | 1 | perf | 126 | 1060.103 | 1022 | 1482 | 193636 | 27.076 | | tile3,1 | 2 | perf | 125 | 1063.232 | 1022 | 1868 | 191744 | 27.343 | | tile3,1 | 3 | perf | 122 | 1066.246 | 1022 | 1586 | 187105 | 28.021 | | tile3,1 | 4 | perf | 127 | 1063.929 | 1022 | 1156 | 194035 | 27.020 | | tile3,1 | 5 | perf | 124 | 1071.887 | 1022 | 2837 | 190006 | 27.593 |
[single_col] M=256 K=128 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 29 | 8912.621 | 2173 | 16149 | 1879974 | 0.697 | | tile3,1 | 0 | warmup | 32 | 1038.531 | 1022 | 1086 | 46186 | 28.379 | | tile3,1 | 1 | perf | 32 | 1054.375 | 1022 | 1158 | 46681 | 28.078 | | tile3,1 | 2 | perf | 32 | 1053.438 | 604 | 1134 | 46661 | 28.090 |
[single_col] M=256 K=256 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 128 | 1109.727 | 1024 | 1198 | 156998 | 33.395 | | tile2,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 3 | perf | 128 | 1108.289 | 1024 | 1201 | 156816 | 33.433 | | tile2,1 | 4 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 5 | perf | 128 | 1111.797 | 1022 | 1197 | 157257 | 33.340 | | tile2,1 | 6 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 7 | extra | 128 | 1127.438 | 1022 | 1171 | 159257 | 32.921 | | tile2,1 | 8 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 9 | extra | 128 | 1119.312 | 1022 | 1170 | 158191 | 33.143 | | tile2,1 | 10 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 11 | extra | 87 | 1106.862 | 1024 | 1199 | 108201 | 48.455 | | tile3,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 1 | perf | 128 | 1108.648 | 1024 | 1186 | 157371 | 33.315 | | tile3,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 3 | perf | 128 | 1107.078 | 1024 | 1185 | 157170 | 33.358 | | tile3,1 | 4 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 5 | perf | 128 | 1110.711 | 1022 | 1191 | 157637 | 33.259 | | tile3,1 | 6 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 7 | extra | 128 | 1127.359 | 1022 | 1170 | 159762 | 32.817 | | tile3,1 | 8 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 9 | extra | 128 | 1119.086 | 1022 | 1170 | 158678 | 33.041 | | tile3,1 | 10 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 11 | extra | 87 | 1106.471 | 1024 | 1186 | 108688 | 48.238 |
[single_col] M=256 K=256 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 256 | 1109.449 | 1024 | 1206 | 319846 | 32.784 | | tile2,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 3 | perf | 255 | 1120.255 | 1022 | 1246 | 321221 | 32.643 | | tile2,1 | 4 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 5 | perf | 215 | 1115.084 | 1024 | 1198 | 272524 | 38.476 | | tile3,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 1 | perf | 256 | 1108.129 | 1024 | 1192 | 320186 | 32.749 | | tile3,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 3 | perf | 255 | 1119.318 | 1022 | 1254 | 321734 | 32.591 | | tile3,1 | 4 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 5 | perf | 214 | 1114.028 | 1024 | 1189 | 271794 | 38.580 |
[single_col] M=256 K=256 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 64 | 1110.828 | 1023 | 1169 | 75594 | 34.678 | | tile2,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 3 | perf | 61 | 1103.426 | 1024 | 1185 | 72427 | 36.194 | | tile2,1 | 4 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 5 | perf | 62 | 1102.032 | 991 | 1184 | 72510 | 36.153 | | tile2,1 | 6 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 7 | extra | 64 | 1106.891 | 1024 | 1186 | 75350 | 34.790 | | tile2,1 | 8 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 9 | extra | 64 | 1107.328 | 1024 | 1192 | 75378 | 34.777 | | tile2,1 | 10 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 11 | extra | 64 | 1114.562 | 1022 | 1192 | 75838 | 34.566 | | tile3,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 1 | perf | 64 | 1110.812 | 1023 | 1169 | 75593 | 34.678 | | tile3,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 3 | perf | 62 | 1101.742 | 998 | 1185 | 72498 | 36.159 | | tile3,1 | 4 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 5 | perf | 62 | 1101.984 | 988 | 1184 | 72507 | 36.154 | | tile3,1 | 6 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 7 | extra | 64 | 1106.891 | 1024 | 1186 | 75350 | 34.790 | | tile3,1 | 8 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 9 | extra | 64 | 1107.328 | 1024 | 1192 | 75378 | 34.777 | | tile3,1 | 10 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 11 | extra | 64 | 1114.562 | 1022 | 1192 | 75838 | 34.566 |
[single_col] M=256 K=64 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 32 | 1053.562 | 1022 | 1115 | 45717 | 28.670 | | tile3,1 | 1 | perf | 29 | 1099.379 | 1022 | 2326 | 42345 | 30.953 | | tile3,1 | 2 | perf | 29 | 1055.862 | 1022 | 1116 | 41743 | 31.400 | | tile3,1 | 3 | perf | 32 | 1053.719 | 1022 | 1113 | 45738 | 28.657 | | tile3,1 | 4 | perf | 32 | 1053.844 | 1022 | 1117 | 45667 | 28.702 | | tile3,1 | 5 | perf | 32 | 1052.000 | 1022 | 1116 | 45625 | 28.728 | | tile2,1 | 0 | warmup | 32 | 1053.906 | 1022 | 1116 | 45717 | 28.670 | | tile2,1 | 1 | perf | 29 | 1099.034 | 1022 | 2404 | 42353 | 30.948 | | tile2,1 | 2 | perf | 29 | 1051.207 | 1015 | 1116 | 41664 | 31.459 | | tile2,1 | 3 | perf | 32 | 1054.219 | 1022 | 1116 | 45737 | 28.658 | | tile2,1 | 4 | perf | 32 | 1053.844 | 1022 | 1117 | 45666 | 28.702 | | tile2,1 | 5 | perf | 32 | 1052.000 | 1022 | 1116 | 45625 | 28.728 |
[single_col] M=256 K=64 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1054.188 | 1022 | 1116 | 92368 | 28.380 | | tile2,1 | 1 | perf | 60 | 1073.350 | 1022 | 2261 | 88686 | 29.559 | | tile2,1 | 2 | perf | 64 | 1053.641 | 1022 | 1117 | 92158 | 28.445 | | tile2,1 | 3 | perf | 64 | 1054.297 | 1022 | 1118 | 92382 | 28.376 | | tile2,1 | 4 | perf | 64 | 1053.984 | 1022 | 1129 | 92263 | 28.413 | | tile2,1 | 5 | perf | 64 | 1054.250 | 1022 | 1118 | 92171 | 28.441 | | tile3,1 | 0 | warmup | 64 | 1054.188 | 1022 | 1116 | 92368 | 28.380 | | tile3,1 | 1 | perf | 60 | 1074.067 | 1022 | 2309 | 88762 | 29.533 | | tile3,1 | 2 | perf | 64 | 1053.641 | 1022 | 1117 | 92158 | 28.445 | | tile3,1 | 3 | perf | 64 | 1054.297 | 1022 | 1118 | 92382 | 28.376 | | tile3,1 | 4 | perf | 64 | 1053.281 | 1022 | 1117 | 92218 | 28.427 | | tile3,1 | 5 | perf | 64 | 1054.250 | 1022 | 1118 | 92171 | 28.441 |
[single_col] M=256 K=64 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1054.312 | 1022 | 1114 | 22451 | 29.191 | | tile2,1 | 1 | perf | 16 | 1052.000 | 1022 | 1116 | 22397 | 29.261 | | tile2,1 | 2 | perf | 16 | 1051.750 | 1022 | 1114 | 22403 | 29.253 | | tile2,1 | 3 | perf | 14 | 1054.500 | 1022 | 1117 | 19552 | 33.519 | | tile2,1 | 4 | perf | 16 | 1051.438 | 1022 | 1113 | 22432 | 29.215 | | tile2,1 | 5 | perf | 16 | 1051.562 | 1022 | 1113 | 22395 | 29.264 | | tile3,1 | 0 | warmup | 16 | 1054.312 | 1022 | 1114 | 22451 | 29.191 | | tile3,1 | 1 | perf | 16 | 1052.000 | 1022 | 1116 | 22397 | 29.261 | | tile3,1 | 2 | perf | 16 | 1051.688 | 1022 | 1114 | 22402 | 29.255 | | tile3,1 | 3 | perf | 14 | 1054.500 | 1022 | 1117 | 19552 | 33.519 | | tile3,1 | 4 | perf | 16 | 1051.438 | 1022 | 1113 | 22432 | 29.215 | | tile3,1 | 5 | perf | 16 | 1051.562 | 1022 | 1113 | 22395 | 29.264 |
[single_col] M=64 K=128 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1044.000 | 1006 | 1159 | 19190 | 34.151 | | tile2,1 | 1 | perf | 15 | 1027.933 | 567 | 1161 | 17886 | 36.641 | | tile2,1 | 2 | perf | 16 | 1054.938 | 1006 | 1134 | 19351 | 33.867 | | tile2,1 | 3 | perf | 16 | 1036.250 | 1006 | 1134 | 19062 | 34.380 | | tile2,1 | 4 | perf | 14 | 1046.000 | 1006 | 1134 | 17427 | 37.606 | | tile2,1 | 5 | perf | 16 | 1034.750 | 1006 | 1134 | 19041 | 34.418 | | tile3,1 | 0 | warmup | 16 | 1044.000 | 1006 | 1159 | 19697 | 33.272 | | tile3,1 | 1 | perf | 15 | 1028.400 | 574 | 1161 | 18402 | 35.614 | | tile3,1 | 2 | perf | 16 | 1054.875 | 1006 | 1134 | 19856 | 33.006 | | tile3,1 | 3 | perf | 16 | 1036.250 | 1006 | 1134 | 19573 | 33.483 | | tile3,1 | 4 | perf | 14 | 1046.000 | 1006 | 1134 | 17955 | 36.500 | | tile3,1 | 5 | perf | 16 | 1034.750 | 1006 | 1134 | 19548 | 33.526 |
[single_col] M=64 K=128 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 32 | 1054.969 | 1006 | 1167 | 41025 | 31.949 | | tile3,1 | 1 | perf | 31 | 1053.258 | 1006 | 1169 | 39894 | 32.855 | | tile3,1 | 2 | perf | 31 | 1054.645 | 1006 | 1171 | 39933 | 32.823 | | tile3,1 | 3 | perf | 29 | 1051.690 | 1006 | 1134 | 39120 | 33.505 | | tile3,1 | 4 | perf | 30 | 1044.233 | 1006 | 1166 | 38561 | 33.991 | | tile3,1 | 5 | perf | 31 | 1050.903 | 1006 | 1168 | 39825 | 32.912 | | tile2,1 | 0 | warmup | 32 | 1054.844 | 1006 | 1167 | 40514 | 32.352 | | tile2,1 | 1 | perf | 31 | 1053.581 | 1006 | 1172 | 39387 | 33.278 | | tile2,1 | 2 | perf | 31 | 1054.548 | 1006 | 1171 | 39420 | 33.250 | | tile2,1 | 3 | perf | 30 | 1069.300 | 1006 | 1451 | 37017 | 35.409 | | tile2,1 | 4 | perf | 30 | 1044.200 | 1006 | 1166 | 38050 | 34.447 | | tile2,1 | 5 | perf | 31 | 1050.935 | 1006 | 1169 | 39320 | 33.335 |
[single_col] M=64 K=128 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 8 | 1038.875 | 1006 | 1134 | 8658 | 37.847 | | tile3,1 | 1 | perf | 8 | 1037.375 | 1006 | 1161 | 8645 | 37.904 | | tile3,1 | 2 | perf | 5 | 1038.600 | 1006 | 1169 | 5484 | 59.752 | | tile3,1 | 3 | perf | 7 | 1049.857 | 1006 | 1134 | 7677 | 42.683 | | tile3,1 | 4 | perf | 8 | 1047.250 | 1006 | 1134 | 8725 | 37.556 | | tile3,1 | 5 | perf | 8 | 1032.250 | 1006 | 1134 | 8603 | 38.089 | | tile2,1 | 0 | warmup | 8 | 1038.875 | 1006 | 1134 | 8657 | 37.851 | | tile2,1 | 1 | perf | 8 | 1037.375 | 1006 | 1161 | 8645 | 37.904 | | tile2,1 | 2 | perf | 5 | 1038.600 | 1006 | 1169 | 5484 | 59.752 | | tile2,1 | 3 | perf | 7 | 1049.714 | 1006 | 1134 | 7675 | 42.694 | | tile2,1 | 4 | perf | 8 | 1047.250 | 1006 | 1134 | 8723 | 37.565 | | tile2,1 | 5 | perf | 8 | 1033.625 | 1006 | 1134 | 8616 | 38.032 |
[single_col] M=64 K=256 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1106.500 | 1024 | 1193 | 43156 | 30.372 | | tile2,1 | 1 | perf | 30 | 1102.433 | 1024 | 1200 | 41472 | 31.605 | | tile2,1 | 2 | perf | 30 | 1113.933 | 1022 | 1404 | 41072 | 31.913 | | tile2,1 | 3 | perf | 32 | 1126.031 | 1022 | 1170 | 43778 | 29.940 | | tile2,1 | 4 | perf | 28 | 1116.000 | 1025 | 1172 | 40464 | 32.392 | | tile2,1 | 5 | perf | 27 | 1111.037 | 1027 | 1187 | 38388 | 34.144 | | tile3,1 | 0 | warmup | 32 | 1106.188 | 1024 | 1177 | 43143 | 30.381 | | tile3,1 | 1 | perf | 30 | 1100.867 | 1024 | 1183 | 41420 | 31.645 | | tile3,1 | 2 | perf | 30 | 1116.133 | 1022 | 1482 | 41156 | 31.848 | | tile3,1 | 3 | perf | 32 | 1125.375 | 1022 | 1170 | 43776 | 29.942 | | tile3,1 | 4 | perf | 30 | 1094.700 | 579 | 1169 | 40488 | 32.373 | | tile3,1 | 5 | perf | 28 | 1109.107 | 1026 | 1179 | 38401 | 34.132 |
[single_col] M=64 K=256 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 64 | 1106.562 | 1025 | 1188 | 92270 | 28.411 | | tile3,1 | 1 | perf | 63 | 1118.286 | 1022 | 1185 | 92090 | 28.466 | | tile3,1 | 2 | perf | 61 | 1115.787 | 1024 | 1366 | 90037 | 29.115 | | tile3,1 | 3 | perf | 64 | 1112.312 | 1022 | 1177 | 92691 | 28.281 | | tile3,1 | 4 | perf | 64 | 1117.016 | 1024 | 1179 | 92920 | 28.212 | | tile3,1 | 5 | perf | 64 | 1107.750 | 1022 | 1186 | 92397 | 28.371 | | tile2,1 | 0 | warmup | 64 | 1107.422 | 1024 | 1207 | 92307 | 28.399 | | tile2,1 | 1 | perf | 63 | 1120.587 | 1022 | 1318 | 91763 | 28.568 | | tile2,1 | 2 | perf | 61 | 1115.262 | 1026 | 1312 | 90007 | 29.125 | | tile2,1 | 3 | perf | 64 | 1113.672 | 1022 | 1194 | 92704 | 28.278 | | tile2,1 | 4 | perf | 64 | 1117.703 | 1025 | 1195 | 92945 | 28.204 | | tile2,1 | 5 | perf | 64 | 1108.531 | 1022 | 1199 | 92409 | 28.368 |
[single_col] M=64 K=256 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 16 | 1106.250 | 1022 | 1165 | 18609 | 35.217 | | tile3,1 | 1 | perf | 16 | 1103.562 | 1024 | 1187 | 18568 | 35.295 | | tile3,1 | 2 | perf | 15 | 1108.867 | 1026 | 1253 | 17497 | 37.456 | | tile3,1 | 3 | perf | 15 | 1102.800 | 1025 | 1185 | 17404 | 37.656 | | tile3,1 | 4 | perf | 16 | 1103.625 | 1025 | 1180 | 18567 | 35.297 | | tile3,1 | 5 | perf | 12 | 1093.917 | 1022 | 1168 | 13634 | 48.068 | | tile2,1 | 0 | warmup | 16 | 1107.000 | 1022 | 1170 | 18621 | 35.195 | | tile2,1 | 1 | perf | 16 | 1104.438 | 1025 | 1199 | 18582 | 35.269 | | tile2,1 | 2 | perf | 15 | 1100.800 | 1026 | 1194 | 17376 | 37.716 | | tile2,1 | 3 | perf | 15 | 1088.867 | 894 | 1199 | 17197 | 38.109 | | tile2,1 | 4 | perf | 16 | 1104.438 | 1026 | 1186 | 18580 | 35.272 | | tile2,1 | 5 | perf | 12 | 1089.667 | 989 | 1169 | 13583 | 48.249 |
[single_col] M=64 K=64 N=128 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 8 | 1055.500 | 1022 | 1118 | 9767 | 33.550 | | tile2,1 | 1 | perf | 8 | 1061.875 | 1022 | 1141 | 9824 | 33.355 | | tile2,1 | 2 | perf | 8 | 1065.375 | 1022 | 1172 | 9840 | 33.301 | | tile2,1 | 3 | perf | 8 | 1059.125 | 1022 | 1086 | 9794 | 33.457 | | tile2,1 | 4 | perf | 8 | 1044.125 | 1022 | 1085 | 9682 | 33.844 | | tile2,1 | 5 | perf | 8 | 1051.250 | 1022 | 1086 | 12414 | 26.396 | | tile3,1 | 0 | warmup | 8 | 1056.375 | 1022 | 1118 | 9774 | 33.526 | | tile3,1 | 1 | perf | 8 | 1059.000 | 1022 | 1115 | 9798 | 33.444 | | tile3,1 | 2 | perf | 8 | 1063.875 | 1022 | 1172 | 9833 | 33.325 | | tile3,1 | 3 | perf | 8 | 1055.875 | 1022 | 1086 | 9781 | 33.502 | | tile3,1 | 4 | perf | 8 | 1041.000 | 1022 | 1085 | 9668 | 33.893 | | tile3,1 | 5 | perf | 8 | 1051.250 | 1022 | 1086 | 12931 | 25.341 |
[single_col] M=64 K=64 N=256 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1059.000 | 1022 | 1129 | 24183 | 27.100 | | tile2,1 | 1 | perf | 14 | 1047.714 | 926 | 1122 | 22066 | 29.700 | | tile2,1 | 2 | perf | 16 | 1062.125 | 1022 | 1173 | 24709 | 26.523 | | tile2,1 | 3 | perf | 16 | 1058.250 | 1022 | 1086 | 26708 | 24.538 | | tile2,1 | 4 | perf | 16 | 1040.750 | 1022 | 1086 | 24790 | 26.436 | | tile2,1 | 5 | perf | 16 | 1059.062 | 1022 | 1124 | 24637 | 26.601 | | tile3,1 | 0 | warmup | 16 | 1059.250 | 1022 | 1129 | 24699 | 26.534 | | tile3,1 | 1 | perf | 14 | 1045.429 | 898 | 1086 | 22599 | 29.000 | | tile3,1 | 2 | perf | 16 | 1060.812 | 1022 | 1175 | 25224 | 25.982 | | tile3,1 | 3 | perf | 16 | 1055.500 | 1022 | 1086 | 27219 | 24.077 | | tile3,1 | 4 | perf | 16 | 1038.125 | 1022 | 1087 | 25306 | 25.897 | | tile3,1 | 5 | perf | 16 | 1056.938 | 1022 | 1135 | 25147 | 26.061 |
[single_col] M=64 K=64 N=64 R=2 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 4 | 1050.500 | 1022 | 1086 | 4475 | 36.612 | | tile3,1 | 1 | perf | 4 | 1053.000 | 1022 | 1086 | 4484 | 36.539 | | tile3,1 | 2 | perf | 4 | 1058.750 | 1022 | 1169 | 4508 | 36.344 | | tile3,1 | 3 | perf | 4 | 1051.500 | 1022 | 1086 | 4479 | 36.580 | | tile3,1 | 4 | perf | 4 | 1037.250 | 1022 | 1083 | 4422 | 37.051 | | tile3,1 | 5 | perf | 4 | 1050.500 | 1022 | 1086 | 4475 | 36.612 | | tile2,1 | 0 | warmup | 4 | 1050.500 | 1022 | 1086 | 4475 | 36.612 | | tile2,1 | 1 | perf | 4 | 1053.000 | 1022 | 1086 | 4484 | 36.539 | | tile2,1 | 2 | perf | 4 | 1058.750 | 1022 | 1169 | 4508 | 36.344 | | tile2,1 | 3 | perf | 4 | 1051.500 | 1022 | 1086 | 4479 | 36.580 | | tile2,1 | 4 | perf | 4 | 1037.250 | 1022 | 1083 | 4422 | 37.051 | | tile2,1 | 5 | perf | 4 | 1050.500 | 1022 | 1086 | 4475 | 36.612 |
[single_core] M=128 K=128 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1057.328 | 1022 | 1157 | 95629 | 27.413 | | tile2,1 | 1 | perf | 64 | 1057.281 | 1022 | 1158 | 95584 | 27.426 | | tile2,1 | 2 | perf | 61 | 1068.951 | 1022 | 1785 | 93072 | 28.166 | | tile2,1 | 3 | perf | 64 | 1060.594 | 1022 | 1134 | 95814 | 27.360 | | tile2,1 | 4 | perf | 64 | 1064.609 | 1022 | 1159 | 96097 | 27.279 | | tile2,1 | 5 | perf | 63 | 1048.238 | 530 | 1156 | 93983 | 27.893 |
[single_core] M=128 K=128 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 125 | 1057.808 | 1022 | 1158 | 191647 | 27.357 | | tile2,1 | 1 | perf | 128 | 1058.875 | 1022 | 1157 | 193293 | 27.124 | | tile2,1 | 2 | perf | 126 | 1063.310 | 1022 | 1343 | 191713 | 27.348 | | tile2,1 | 3 | perf | 128 | 1057.180 | 1022 | 1157 | 193057 | 27.157 | | tile2,1 | 4 | perf | 128 | 1062.648 | 1022 | 1156 | 193803 | 27.053 | | tile2,1 | 5 | perf | 128 | 1057.617 | 1022 | 1159 | 193161 | 27.143 |
[single_core] M=128 K=128 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1045.906 | 1022 | 1158 | 46502 | 28.186 | | tile2,1 | 1 | perf | 32 | 1067.812 | 1022 | 1134 | 47187 | 27.777 | | tile2,1 | 2 | perf | 32 | 1052.469 | 1022 | 1134 | 46683 | 28.077 | | tile2,1 | 3 | perf | 32 | 1061.625 | 1022 | 1158 | 46993 | 27.892 | | tile2,1 | 4 | perf | 32 | 1066.625 | 1022 | 1134 | 47164 | 27.791 | | tile2,1 | 5 | perf | 32 | 1046.156 | 1022 | 1159 | 46508 | 28.183 |
[single_core] M=128 K=256 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 128 | 1106.898 | 1024 | 1186 | 161825 | 32.398 | | tile2,1 | 1 | perf | 126 | 1114.167 | 716 | 1185 | 160455 | 32.675 | | tile2,1 | 2 | perf | 128 | 1112.148 | 1023 | 1185 | 162465 | 32.271 | | tile2,1 | 3 | perf | 119 | 1109.328 | 728 | 1186 | 151747 | 34.550 | | tile2,1 | 4 | perf | 127 | 1118.787 | 1024 | 1269 | 161942 | 32.375 | | tile2,1 | 5 | perf | 127 | 1101.874 | 425 | 1188 | 160034 | 32.761 |
[single_core] M=128 K=256 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 256 | 1112.875 | 1022 | 1190 | 329093 | 31.863 | | tile2,1 | 1 | perf | 255 | 1112.671 | 1022 | 1189 | 327838 | 31.985 | | tile2,1 | 2 | perf | 255 | 1113.490 | 1022 | 1201 | 328063 | 31.963 | | tile2,1 | 3 | perf | 256 | 1114.570 | 1024 | 1189 | 329511 | 31.822 | | tile2,1 | 4 | perf | 256 | 1115.543 | 1023 | 1186 | 329769 | 31.797 | | tile2,1 | 5 | perf | 166 | 1119.102 | 1024 | 1189 | 216085 | 48.526 |
[single_core] M=128 K=256 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1107.766 | 1024 | 1186 | 79022 | 33.174 | | tile2,1 | 1 | perf | 61 | 1104.541 | 868 | 1185 | 75358 | 34.786 | | tile2,1 | 2 | perf | 60 | 1100.717 | 424 | 1185 | 73976 | 35.436 | | tile2,1 | 3 | perf | 64 | 1119.641 | 646 | 1170 | 79777 | 32.860 | | tile2,1 | 4 | perf | 62 | 1113.903 | 931 | 1170 | 77069 | 34.014 | | tile2,1 | 5 | perf | 64 | 1105.859 | 1024 | 1182 | 78887 | 33.230 |
[single_core] M=128 K=64 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1054.906 | 1022 | 1132 | 39253 | 33.392 | | tile2,1 | 1 | perf | 32 | 1053.906 | 1022 | 1086 | 39219 | 33.421 | | tile2,1 | 2 | perf | 31 | 1058.226 | 1022 | 1176 | 38274 | 34.246 | | tile2,1 | 3 | perf | 32 | 1053.344 | 1022 | 1086 | 39201 | 33.436 | | tile2,1 | 4 | perf | 30 | 1034.367 | 494 | 1171 | 36269 | 36.139 | | tile2,1 | 5 | perf | 32 | 1053.938 | 1022 | 1086 | 39222 | 33.418 |
[single_core] M=128 K=64 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1055.812 | 1022 | 1132 | 79319 | 33.049 | | tile2,1 | 1 | perf | 62 | 1057.919 | 1022 | 1231 | 76542 | 34.248 | | tile2,1 | 2 | perf | 62 | 1069.565 | 1022 | 1804 | 77257 | 33.931 | | tile2,1 | 3 | perf | 64 | 1054.672 | 1022 | 1086 | 79231 | 33.086 | | tile2,1 | 4 | perf | 64 | 1056.969 | 1022 | 1171 | 79424 | 33.006 | | tile2,1 | 5 | perf | 64 | 1054.844 | 1022 | 1086 | 79259 | 33.074 |
[single_core] M=128 K=64 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1053.250 | 1022 | 1117 | 19200 | 34.133 | | tile2,1 | 1 | perf | 16 | 1053.062 | 1022 | 1086 | 19202 | 34.130 | | tile2,1 | 2 | perf | 16 | 1054.938 | 1022 | 1159 | 19221 | 34.096 | | tile2,1 | 3 | perf | 16 | 1052.500 | 1022 | 1087 | 19186 | 34.158 | | tile2,1 | 4 | perf | 16 | 1054.812 | 1022 | 1169 | 19225 | 34.089 | | tile2,1 | 5 | perf | 14 | 1045.929 | 935 | 1086 | 16199 | 40.457 |
[single_core] M=256 K=128 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 128 | 1057.383 | 1022 | 1156 | 192784 | 27.196 | | tile2,1 | 1 | perf | 128 | 1057.297 | 1022 | 1156 | 192731 | 27.203 | | tile2,1 | 2 | perf | 125 | 1062.360 | 1022 | 1765 | 190126 | 27.576 | | tile2,1 | 3 | perf | 128 | 1060.531 | 1022 | 1134 | 193158 | 27.143 | | tile2,1 | 4 | perf | 128 | 1064.367 | 1022 | 1157 | 193684 | 27.069 | | tile2,1 | 5 | perf | 128 | 1057.453 | 1022 | 1156 | 192794 | 27.194 |
[single_core] M=256 K=128 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 256 | 1057.473 | 1022 | 1156 | 387455 | 27.063 | | tile2,1 | 1 | perf | 256 | 1058.715 | 1022 | 1158 | 387735 | 27.044 | | tile2,1 | 2 | perf | 255 | 1061.176 | 1022 | 1157 | 387567 | 27.055 | | tile2,1 | 3 | perf | 255 | 1059.000 | 1022 | 1646 | 386714 | 27.115 | | tile2,1 | 4 | perf | 256 | 1062.562 | 1022 | 1158 | 388758 | 26.972 | | tile2,1 | 5 | perf | 255 | 1057.251 | 1022 | 1157 | 386317 | 27.143 |
[single_core] M=256 K=128 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1046.391 | 1022 | 1158 | 94733 | 27.672 | | tile2,1 | 1 | perf | 64 | 1067.906 | 1022 | 1134 | 96111 | 27.275 | | tile2,1 | 2 | perf | 64 | 1052.656 | 1022 | 1134 | 95104 | 27.564 | | tile2,1 | 3 | perf | 62 | 1064.581 | 1022 | 1232 | 93738 | 27.966 | | tile2,1 | 4 | perf | 64 | 1066.594 | 1022 | 1134 | 96028 | 27.299 | | tile2,1 | 5 | perf | 64 | 1046.281 | 1022 | 1156 | 94727 | 27.674 |
[single_core] M=256 K=256 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 256 | 1107.922 | 1024 | 1186 | 313324 | 33.466 | | tile2,1 | 2 | perf | 253 | 1118.874 | 992 | 1188 | 312630 | 33.540 | | tile2,1 | 3 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 4 | perf | 256 | 1113.168 | 1023 | 1185 | 314650 | 33.325 | | tile2,1 | 5 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 6 | extra | 256 | 1113.578 | 1022 | 1189 | 314781 | 33.311 | | tile2,1 | 7 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 8 | extra | 256 | 1118.613 | 1022 | 1188 | 316047 | 33.178 | | tile2,1 | 9 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 10 | extra | 167 | 1107.467 | 1023 | 1190 | 204412 | 51.297 |
[single_core] M=256 K=256 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 512 | 1113.619 | 1022 | 1190 | 633471 | 33.106 | | tile2,1 | 2 | perf | 511 | 1113.184 | 1020 | 1192 | 632076 | 33.179 | | tile2,1 | 3 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 4 | perf | 421 | 1114.565 | 1022 | 1184 | 522184 | 40.161 |
[single_core] M=256 K=256 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 128 | 1108.781 | 1023 | 1186 | 154828 | 33.863 | | tile2,1 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 3 | perf | 127 | 1103.134 | 615 | 1188 | 152943 | 34.280 | | tile2,1 | 4 | perf | 128 | 1110.078 | 1022 | 1186 | 154985 | 33.828 | | tile2,1 | 5 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 6 | extra | 128 | 1127.062 | 1022 | 1170 | 157145 | 33.363 | | tile2,1 | 7 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 8 | extra | 128 | 1119.297 | 1022 | 1170 | 156143 | 33.577 | | tile2,1 | 9 | extra | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 10 | extra | 128 | 1107.000 | 1024 | 1192 | 154597 | 33.913 |
[single_core] M=256 K=64 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1054.188 | 1022 | 1116 | 92368 | 28.380 | | tile2,1 | 1 | perf | 62 | 1059.952 | 1022 | 1415 | 89735 | 29.213 | | tile2,1 | 2 | perf | 62 | 1060.355 | 1022 | 1487 | 89671 | 29.234 | | tile2,1 | 3 | perf | 64 | 1054.141 | 1022 | 1118 | 92371 | 28.379 | | tile2,1 | 4 | perf | 64 | 1053.984 | 1022 | 1129 | 92263 | 28.413 | | tile2,1 | 5 | perf | 64 | 1054.250 | 1022 | 1118 | 92171 | 28.441 |
[single_core] M=256 K=64 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 128 | 1054.484 | 1022 | 1118 | 185486 | 28.266 | | tile2,1 | 1 | perf | 125 | 1054.968 | 562 | 1679 | 181523 | 28.883 | | tile2,1 | 2 | perf | 126 | 1056.952 | 1022 | 1468 | 182958 | 28.656 | | tile2,1 | 3 | perf | 126 | 1060.167 | 1022 | 1769 | 183334 | 28.597 | | tile2,1 | 4 | perf | 127 | 1053.906 | 1022 | 1129 | 185417 | 28.276 | | tile2,1 | 5 | perf | 128 | 1053.828 | 1022 | 1126 | 185416 | 28.276 |
[single_core] M=256 K=64 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1053.844 | 1022 | 1116 | 45713 | 28.673 | | tile2,1 | 1 | perf | 32 | 1054.094 | 1022 | 1118 | 45725 | 28.665 | | tile2,1 | 2 | perf | 32 | 1052.281 | 1022 | 1116 | 45624 | 28.729 | | tile2,1 | 3 | perf | 30 | 1072.967 | 1022 | 1652 | 43398 | 30.202 | | tile2,1 | 4 | perf | 32 | 1053.344 | 1022 | 1117 | 45633 | 28.723 | | tile2,1 | 5 | perf | 32 | 1052.000 | 1022 | 1116 | 45625 | 28.728 |
[single_core] M=64 K=128 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1054.094 | 1006 | 1167 | 40490 | 32.371 | | tile2,1 | 1 | perf | 32 | 1048.719 | 1006 | 1172 | 40302 | 32.522 | | tile2,1 | 2 | perf | 32 | 1052.875 | 1006 | 1171 | 40440 | 32.411 | | tile2,1 | 3 | perf | 32 | 1051.562 | 1006 | 1134 | 40397 | 32.446 | | tile2,1 | 4 | perf | 32 | 1049.312 | 1006 | 1168 | 40340 | 32.492 | | tile2,1 | 5 | perf | 32 | 1048.812 | 1006 | 1156 | 40319 | 32.509 |
[single_core] M=64 K=128 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1053.141 | 1006 | 1167 | 82696 | 31.700 | | tile2,1 | 1 | perf | 64 | 1056.438 | 1006 | 1172 | 82895 | 31.624 | | tile2,1 | 2 | perf | 64 | 1049.703 | 1006 | 1171 | 82477 | 31.784 | | tile2,1 | 3 | perf | 63 | 1042.063 | 437 | 1134 | 79145 | 33.122 | | tile2,1 | 4 | perf | 64 | 1047.719 | 1006 | 1168 | 82356 | 31.831 | | tile2,1 | 5 | perf | 64 | 1047.766 | 1006 | 1156 | 82355 | 31.831 |
[single_core] M=64 K=128 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1044.000 | 1006 | 1159 | 19190 | 34.151 | | tile2,1 | 1 | perf | 16 | 1062.000 | 1006 | 1161 | 19476 | 33.650 | | tile2,1 | 2 | perf | 16 | 1017.750 | 539 | 1134 | 18756 | 34.941 | | tile2,1 | 3 | perf | 16 | 1036.250 | 1006 | 1134 | 19063 | 34.379 | | tile2,1 | 4 | perf | 16 | 1049.000 | 1006 | 1134 | 19269 | 34.011 | | tile2,1 | 5 | perf | 16 | 1034.750 | 1006 | 1134 | 19041 | 34.418 |
[single_core] M=64 K=256 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 64 | 1106.062 | 1024 | 1185 | 86071 | 30.457 | | tile2,1 | 1 | perf | 64 | 1117.359 | 1022 | 1186 | 86825 | 30.192 | | tile2,1 | 2 | perf | 63 | 1115.714 | 1024 | 1298 | 85508 | 30.657 | | tile2,1 | 3 | perf | 64 | 1111.906 | 1022 | 1179 | 86456 | 30.321 | | tile2,1 | 4 | perf | 64 | 1115.875 | 1023 | 1180 | 86696 | 30.237 | | tile2,1 | 5 | perf | 64 | 1107.250 | 1022 | 1184 | 86147 | 30.430 |
[single_core] M=64 K=256 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 128 | 1111.992 | 1022 | 1186 | 176809 | 29.653 | | tile2,1 | 1 | perf | 128 | 1112.117 | 1022 | 1188 | 176794 | 29.655 | | tile2,1 | 2 | perf | 128 | 1112.391 | 1022 | 1186 | 176836 | 29.648 | | tile2,1 | 3 | perf | 128 | 1113.219 | 1023 | 1184 | 176953 | 29.629 | | tile2,1 | 4 | perf | 128 | 1114.094 | 1023 | 1179 | 177076 | 29.608 | | tile2,1 | 5 | perf | 128 | 1114.266 | 1023 | 1188 | 177133 | 29.599 |
[single_core] M=64 K=256 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1106.062 | 1024 | 1177 | 41093 | 31.896 | | tile2,1 | 1 | perf | 32 | 1103.719 | 1024 | 1180 | 41018 | 31.955 | | tile2,1 | 2 | perf | 32 | 1108.344 | 1022 | 1180 | 41164 | 31.841 | | tile2,1 | 3 | perf | 32 | 1108.312 | 633 | 1169 | 41162 | 31.843 | | tile2,1 | 4 | perf | 32 | 1115.625 | 1022 | 1169 | 41383 | 31.673 | | tile2,1 | 5 | perf | 32 | 1104.812 | 1024 | 1182 | 41051 | 31.929 |
[single_core] M=64 K=64 N=128 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 16 | 1059.562 | 1022 | 1135 | 20381 | 32.155 | | tile2,1 | 1 | perf | 16 | 1058.188 | 1022 | 1086 | 20360 | 32.189 | | tile2,1 | 2 | perf | 16 | 1063.188 | 1022 | 1173 | 20446 | 32.053 | | tile2,1 | 3 | perf | 16 | 1058.312 | 1022 | 1086 | 20359 | 32.190 | | tile2,1 | 4 | perf | 16 | 1040.562 | 1022 | 1082 | 20073 | 32.649 | | tile2,1 | 5 | perf | 16 | 1060.688 | 1022 | 1120 | 20427 | 32.083 |
[single_core] M=64 K=64 N=256 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 32 | 1060.469 | 1022 | 1135 | 41607 | 31.502 | | tile2,1 | 1 | perf | 30 | 1069.867 | 1022 | 1444 | 39519 | 33.167 | | tile2,1 | 2 | perf | 32 | 1064.906 | 1022 | 1173 | 41717 | 31.419 | | tile2,1 | 3 | perf | 32 | 1057.438 | 1022 | 1086 | 41528 | 31.562 | | tile2,1 | 4 | perf | 32 | 1041.406 | 1022 | 1085 | 40997 | 31.971 | | tile2,1 | 5 | perf | 30 | 1041.533 | 479 | 1137 | 38688 | 33.879 |
[single_core] M=64 K=64 N=64 R=1 C=1 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile2,1 | 0 | warmup | 8 | 1059.625 | 1022 | 1119 | 9799 | 33.440 | | tile2,1 | 1 | perf | 8 | 1057.000 | 1022 | 1086 | 9786 | 33.485 | | tile2,1 | 2 | perf | 8 | 1063.500 | 1022 | 1172 | 9825 | 33.352 | | tile2,1 | 3 | perf | 8 | 1057.375 | 1022 | 1086 | 9781 | 33.502 | | tile2,1 | 4 | perf | 8 | 1042.500 | 1022 | 1085 | 9669 | 33.890 | | tile2,1 | 5 | perf | 8 | 1058.000 | 1022 | 1120 | 9780 | 33.505 |
[whole_array] M=128 K=128 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,4 | 0 | warmup | 7 | 1045.857 | 567 | 1639 | 11839 | 55.356 | | tile3,4 | 1 | perf | 7 | 1314.000 | 1086 | 2057 | 13771 | 47.590 | | tile3,4 | 2 | perf | 6 | 1206.000 | 541 | 2092 | 12041 | 54.427 | | tile3,4 | 3 | perf | 6 | 1423.000 | 911 | 2344 | 9561 | 68.545 | | tile3,4 | 4 | perf | 6 | 1452.167 | 834 | 2036 | 9123 | 71.836 | | tile3,4 | 5 | perf | 7 | 1341.857 | 864 | 2401 | 13229 | 49.540 | | tile3,2 | 0 | warmup | 15 | 962.333 | 495 | 1276 | 18945 | 34.593 | | tile3,2 | 1 | perf | 13 | 1164.000 | 775 | 1625 | 20267 | 32.336 | | tile3,2 | 2 | perf | 15 | 1006.400 | 682 | 1623 | 20345 | 32.212 | | tile3,2 | 3 | perf | 15 | 945.533 | 476 | 1296 | 15792 | 41.499 | | tile3,2 | 4 | perf | 15 | 978.000 | 518 | 1174 | 16194 | 40.469 | | tile3,2 | 5 | perf | 14 | 1043.214 | 513 | 1635 | 19505 | 33.600 | | tile3,3 | 0 | warmup | 7 | 975.429 | 647 | 1328 | 12028 | 54.486 | | tile3,3 | 1 | perf | 7 | 1076.286 | 521 | 1333 | 13380 | 48.981 | | tile3,3 | 2 | perf | 7 | 1144.857 | 491 | 1651 | 13068 | 50.150 | | tile3,3 | 3 | perf | 5 | 1222.400 | 963 | 1608 | 8373 | 78.271 | | tile3,3 | 4 | perf | 8 | 941.875 | 527 | 1767 | 8855 | 74.010 | | tile3,3 | 5 | perf | 8 | 1030.000 | 403 | 1891 | 12911 | 50.760 | | tile3,1 | 0 | warmup | 12 | 953.667 | 713 | 1181 | 13565 | 48.313 | | tile3,1 | 1 | perf | 14 | 874.571 | 461 | 1181 | 14941 | 43.863 | | tile3,1 | 2 | perf | 14 | 933.071 | 514 | 1413 | 14812 | 44.245 | | tile3,1 | 3 | perf | 11 | 1049.909 | 722 | 1254 | 13929 | 47.050 | | tile3,1 | 4 | perf | 9 | 1052.111 | 860 | 1469 | 13300 | 49.275 | | tile3,1 | 5 | perf | 10 | 1063.100 | 882 | 1458 | 13775 | 47.576 | | tile2,4 | 0 | warmup | 5 | 1169.200 | 887 | 1694 | 10102 | 64.874 | | tile2,4 | 1 | perf | 6 | 981.167 | 365 | 1403 | 10892 | 60.169 | | tile2,4 | 2 | perf | 6 | 1161.167 | 386 | 2531 | 10898 | 60.136 | | tile2,4 | 3 | perf | 3 | 1273.000 | 946 | 1920 | 4178 | 156.860 | | tile2,4 | 4 | perf | 7 | 922.714 | 343 | 1637 | 7106 | 92.226 | | tile2,4 | 5 | perf | 5 | 1158.800 | 440 | 2255 | 10459 | 62.660 | | tile2,1 | 0 | warmup | 12 | 779.500 | 397 | 1211 | 10535 | 62.208 | | tile2,1 | 1 | perf | 13 | 761.769 | 445 | 1206 | 11363 | 57.675 | | tile2,1 | 2 | perf | 10 | 800.500 | 470 | 1201 | 10768 | 60.862 | | tile2,1 | 3 | perf | 8 | 773.000 | 529 | 1086 | 9530 | 68.768 | | tile2,1 | 4 | perf | 8 | 779.375 | 538 | 1086 | 10076 | 65.042 | | tile2,1 | 5 | perf | 8 | 882.375 | 612 | 1210 | 10153 | 64.548 | | tile2,3 | 0 | warmup | 1 | 1215.000 | 1215 | 1215 | 1215 | 539.391 | | tile2,3 | 1 | perf | 1 | 1213.000 | 1213 | 1213 | 1213 | 540.280 | | tile2,3 | 2 | perf | 1 | 1206.000 | 1206 | 1206 | 1206 | 543.416 | | tile2,3 | 3 | perf | 1 | 1198.000 | 1198 | 1198 | 1198 | 547.045 | | tile2,3 | 4 | perf | 1 | 1208.000 | 1208 | 1208 | 1208 | 542.517 | | tile2,3 | 5 | perf | 1 | 1214.000 | 1214 | 1214 | 1214 | 539.835 | | tile2,2 | 0 | warmup | 6 | 976.667 | 415 | 1526 | 10651 | 61.530 | | tile2,2 | 1 | perf | 9 | 855.000 | 346 | 1215 | 12125 | 54.050 | | tile2,2 | 2 | perf | 7 | 1037.429 | 509 | 1532 | 12067 | 54.310 | | tile2,2 | 3 | perf | 8 | 772.625 | 359 | 1258 | 7841 | 83.581 | | tile2,2 | 4 | perf | 7 | 882.571 | 421 | 1633 | 7709 | 85.012 | | tile2,2 | 5 | perf | 7 | 1053.714 | 364 | 2603 | 11466 | 57.157 |
[whole_array] M=128 K=256 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 27 | 927.704 | 451 | 1139 | 28062 | 46.708 | | tile3,1 | 1 | perf | 25 | 972.040 | 418 | 1237 | 30134 | 43.496 | | tile3,1 | 2 | perf | 26 | 1014.462 | 862 | 1354 | 29515 | 44.409 | | tile3,1 | 3 | perf | 25 | 1007.400 | 842 | 1141 | 29699 | 44.133 | | tile3,1 | 4 | perf | 27 | 982.222 | 698 | 1389 | 29108 | 45.030 | | tile3,1 | 5 | perf | 27 | 983.815 | 764 | 1205 | 29640 | 44.221 | | tile3,4 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,4 | 1 | perf | 25 | 826.640 | 495 | 1086 | 23154 | 56.609 | | tile3,4 | 2 | perf | 25 | 835.400 | 696 | 1138 | 24765 | 52.926 | | tile3,4 | 3 | perf | 24 | 831.500 | 728 | 1086 | 23906 | 54.828 | | tile3,4 | 4 | perf | 24 | 824.542 | 684 | 1086 | 23816 | 55.035 | | tile3,4 | 5 | perf | 24 | 854.292 | 736 | 1086 | 24569 | 53.349 | | tile3,4 | 6 | extra | 26 | 804.538 | 330 | 1086 | 24177 | 54.214 | | tile3,2 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,2 | 1 | perf | 28 | 789.107 | 555 | 1148 | 23486 | 55.809 | | tile3,2 | 2 | perf | 28 | 775.786 | 529 | 1086 | 24534 | 53.425 | | tile3,2 | 3 | perf | 26 | 779.731 | 505 | 1475 | 23742 | 55.207 | | tile3,2 | 4 | perf | 29 | 789.862 | 516 | 1423 | 24326 | 53.881 | | tile3,2 | 5 | perf | 26 | 776.654 | 643 | 1461 | 23575 | 55.598 | | tile3,2 | 6 | extra | 25 | 794.880 | 526 | 1359 | 24351 | 53.826 | | tile3,3 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,3 | 1 | perf | 24 | 815.875 | 643 | 1086 | 22298 | 58.782 | | tile3,3 | 2 | perf | 25 | 800.880 | 566 | 1086 | 23502 | 55.771 | | tile3,3 | 3 | perf | 26 | 813.115 | 673 | 1086 | 23568 | 55.614 | | tile3,3 | 4 | perf | 25 | 797.640 | 527 | 1086 | 23027 | 56.921 | | tile3,3 | 5 | perf | 25 | 821.200 | 674 | 1086 | 23763 | 55.158 | | tile3,3 | 6 | extra | 25 | 797.880 | 567 | 1086 | 23226 | 56.433 | | tile2,3 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,3 | 1 | perf | 21 | 763.714 | 500 | 1691 | 18353 | 71.417 | | tile2,3 | 2 | perf | 21 | 680.762 | 339 | 1551 | 16578 | 79.064 | | tile2,3 | 3 | perf | 19 | 651.263 | 294 | 1495 | 14861 | 88.199 | | tile2,3 | 4 | perf | 24 | 564.292 | 290 | 1182 | 15954 | 82.156 | | tile2,3 | 5 | perf | 22 | 588.091 | 293 | 1176 | 15680 | 83.592 | | tile2,3 | 6 | extra | 18 | 726.056 | 345 | 1622 | 15403 | 85.095 | | tile2,2 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,2 | 1 | perf | 24 | 716.708 | 552 | 1086 | 19680 | 66.602 | | tile2,2 | 2 | perf | 25 | 704.880 | 344 | 1172 | 19838 | 66.071 | | tile2,2 | 3 | perf | 24 | 696.417 | 301 | 1247 | 19094 | 68.646 | | tile2,2 | 4 | perf | 26 | 654.769 | 304 | 1134 | 19661 | 66.666 | | tile2,2 | 5 | perf | 23 | 723.043 | 508 | 1106 | 19067 | 68.743 | | tile2,2 | 6 | extra | 21 | 685.048 | 340 | 1228 | 17137 | 76.485 | | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 25 | 735.000 | 385 | 1179 | 20930 | 62.624 | | tile2,1 | 2 | perf | 26 | 708.500 | 313 | 1841 | 20256 | 64.708 | | tile2,1 | 3 | perf | 26 | 646.038 | 266 | 1288 | 19043 | 68.829 | | tile2,1 | 4 | perf | 26 | 679.923 | 408 | 1769 | 19703 | 66.524 | | tile2,1 | 5 | perf | 26 | 655.462 | 381 | 1171 | 19186 | 68.316 | | tile2,1 | 6 | extra | 24 | 654.000 | 270 | 1643 | 18935 | 69.222 | | tile2,4 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,4 | 1 | perf | 21 | 723.000 | 395 | 1755 | 17298 | 75.773 | | tile2,4 | 2 | perf | 20 | 706.650 | 291 | 1676 | 15616 | 83.934 | | tile2,4 | 3 | perf | 21 | 600.524 | 280 | 977 | 15729 | 83.331 | | tile2,4 | 4 | perf | 20 | 640.900 | 248 | 1148 | 15185 | 86.317 | | tile2,4 | 5 | perf | 20 | 650.500 | 250 | 1212 | 15073 | 86.958 | | tile2,4 | 6 | extra | 20 | 631.500 | 223 | 1129 | 15788 | 83.020 |
[whole_array] M=128 K=64 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,3 | 0 | warmup | 1 | 1185.000 | 1185 | 1185 | 1185 | 276.523 | | tile3,3 | 1 | perf | 1 | 1204.000 | 1204 | 1204 | 1204 | 272.159 | | tile3,3 | 2 | perf | 1 | 1214.000 | 1214 | 1214 | 1214 | 269.918 | | tile3,3 | 3 | perf | 1 | 1196.000 | 1196 | 1196 | 1196 | 273.980 | | tile3,3 | 4 | perf | 1 | 1196.000 | 1196 | 1196 | 1196 | 273.980 | | tile3,3 | 5 | perf | 1 | 1192.000 | 1192 | 1192 | 1192 | 274.899 | | tile3,4 | 0 | warmup | 3 | 1310.333 | 1135 | 1615 | 4756 | 68.898 | | tile3,4 | 1 | perf | 3 | 1020.333 | 870 | 1183 | 3532 | 92.775 | | tile3,4 | 2 | perf | 2 | 1369.000 | 969 | 1769 | 3312 | 98.937 | | tile3,4 | 3 | perf | 4 | 1243.750 | 1015 | 1671 | 5697 | 57.518 | | tile3,4 | 4 | perf | 4 | 1197.500 | 891 | 1722 | 5548 | 59.063 | | tile3,4 | 5 | perf | 4 | 1262.500 | 1067 | 1668 | 5926 | 55.295 | | tile3,2 | 0 | warmup | 5 | 1182.000 | 922 | 1697 | 8014 | 40.888 | | tile3,2 | 1 | perf | 6 | 877.167 | 566 | 1174 | 6953 | 47.128 | | tile3,2 | 2 | perf | 3 | 1074.000 | 919 | 1178 | 4800 | 68.267 | | tile3,2 | 3 | perf | 4 | 1296.750 | 1086 | 1697 | 7231 | 45.316 | | tile3,2 | 4 | perf | 5 | 1307.800 | 1086 | 1778 | 8413 | 38.949 | | tile3,2 | 5 | perf | 4 | 1601.000 | 1086 | 3002 | 11515 | 28.457 | | tile3,1 | 0 | warmup | 4 | 1062.250 | 864 | 1181 | 4532 | 72.304 | | tile3,1 | 1 | perf | 8 | 854.750 | 616 | 1174 | 7502 | 43.679 | | tile3,1 | 2 | perf | 4 | 1032.250 | 808 | 1178 | 4435 | 73.885 | | tile3,1 | 3 | perf | 3 | 1064.000 | 1008 | 1113 | 4265 | 76.830 | | tile3,1 | 4 | perf | 4 | 1020.750 | 921 | 1140 | 4334 | 75.607 | | tile3,1 | 5 | perf | 4 | 1367.750 | 1035 | 2080 | 6653 | 49.253 | | tile2,2 | 0 | warmup | 2 | 1397.500 | 1210 | 1585 | 5930 | 55.258 | | tile2,2 | 1 | perf | 2 | 812.000 | 429 | 1195 | 2914 | 112.450 | | tile2,2 | 2 | perf | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,2 | 3 | perf | 1 | 844.000 | 844 | 844 | 844 | 388.246 | | tile2,2 | 4 | perf | 2 | 1008.000 | 814 | 1202 | 4861 | 67.410 | | tile2,2 | 5 | perf | 1 | 1551.000 | 1551 | 1551 | 1551 | 211.270 | | tile2,3 | 0 | warmup | 3 | 1298.333 | 1138 | 1546 | 10002 | 32.761 | | tile2,3 | 1 | perf | 3 | 1261.667 | 1143 | 1446 | 12907 | 25.388 | | tile2,3 | 2 | perf | 3 | 1252.000 | 1137 | 1415 | 9337 | 35.095 | | tile2,3 | 3 | perf | 4 | 1060.750 | 509 | 1383 | 10441 | 31.384 | | tile2,3 | 4 | perf | 2 | 1250.500 | 1135 | 1366 | 8582 | 38.182 | | tile2,3 | 5 | perf | 2 | 1331.500 | 1138 | 1525 | 8627 | 37.983 | | tile2,4 | 0 | warmup | 3 | 1167.333 | 950 | 1411 | 4313 | 75.975 | | tile2,4 | 1 | perf | 3 | 851.000 | 571 | 1196 | 3203 | 102.304 | | tile2,4 | 2 | perf | 3 | 1094.667 | 906 | 1203 | 3848 | 85.156 | | tile2,4 | 3 | perf | 4 | 811.500 | 546 | 1053 | 3909 | 83.827 | | tile2,4 | 4 | perf | 3 | 1082.333 | 892 | 1252 | 3873 | 84.606 | | tile2,4 | 5 | perf | 2 | 1288.500 | 1141 | 1436 | 3297 | 99.387 | | tile2,1 | 0 | warmup | 3 | 924.667 | 617 | 1168 | 3426 | 95.645 | | tile2,1 | 1 | perf | 4 | 639.000 | 535 | 772 | 3928 | 83.422 | | tile2,1 | 2 | perf | 3 | 819.000 | 743 | 881 | 3052 | 107.366 | | tile2,1 | 3 | perf | 4 | 740.250 | 442 | 981 | 3999 | 81.940 | | tile2,1 | 4 | perf | 3 | 770.667 | 628 | 933 | 2927 | 111.951 | | tile2,1 | 5 | perf | 4 | 905.000 | 722 | 1163 | 4425 | 74.052 |
[whole_array] M=256 K=128 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,3 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,3 | 1 | perf | 15 | 681.000 | 347 | 1086 | 15862 | 82.633 | | tile3,3 | 2 | perf | 10 | 804.100 | 641 | 1309 | 14623 | 89.634 | | tile3,3 | 3 | perf | 13 | 753.000 | 664 | 1086 | 16499 | 79.442 | | tile3,3 | 4 | perf | 14 | 741.714 | 597 | 1086 | 16149 | 81.164 | | tile3,3 | 5 | perf | 14 | 763.500 | 620 | 1086 | 16082 | 81.502 | | tile3,3 | 6 | extra | 14 | 739.786 | 546 | 1086 | 15530 | 84.399 | | tile3,4 | 0 | warmup | 15 | 792.933 | 600 | 1086 | 18287 | 71.675 | | tile3,4 | 1 | perf | 12 | 812.167 | 301 | 1086 | 17311 | 75.716 | | tile3,4 | 2 | perf | 13 | 852.000 | 745 | 1086 | 18540 | 70.697 | | tile3,4 | 3 | perf | 17 | 804.294 | 496 | 1111 | 18901 | 69.347 | | tile3,4 | 4 | perf | 14 | 812.786 | 672 | 1086 | 17988 | 72.866 | | tile3,4 | 5 | perf | 20 | 719.900 | 279 | 1086 | 19018 | 68.920 | | tile3,1 | 0 | warmup | 19 | 858.737 | 476 | 1193 | 24061 | 54.475 | | tile3,1 | 1 | perf | 17 | 962.882 | 402 | 1230 | 22126 | 59.239 | | tile3,1 | 2 | perf | 16 | 912.812 | 711 | 1086 | 22372 | 58.588 | | tile3,1 | 3 | perf | 20 | 942.700 | 364 | 1367 | 23974 | 54.673 | | tile3,1 | 4 | perf | 17 | 932.588 | 453 | 1361 | 24051 | 54.498 | | tile3,1 | 5 | perf | 22 | 766.864 | 328 | 1086 | 24701 | 53.063 | | tile3,2 | 0 | warmup | 27 | 927.630 | 488 | 1236 | 28672 | 45.714 | | tile3,2 | 1 | perf | 25 | 967.600 | 628 | 2015 | 27795 | 47.157 | | tile3,2 | 2 | perf | 22 | 1073.455 | 769 | 1551 | 26606 | 49.264 | | tile3,2 | 3 | perf | 27 | 963.148 | 495 | 1979 | 28210 | 46.463 | | tile3,2 | 4 | perf | 26 | 962.654 | 459 | 1984 | 28110 | 46.628 | | tile3,2 | 5 | perf | 31 | 844.419 | 357 | 1136 | 29506 | 44.422 | | tile2,4 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,4 | 1 | perf | 9 | 668.444 | 377 | 1140 | 10903 | 120.216 | | tile2,4 | 2 | perf | 10 | 528.100 | 193 | 1086 | 9103 | 143.988 | | tile2,4 | 3 | perf | 11 | 535.091 | 192 | 1086 | 10005 | 131.006 | | tile2,4 | 4 | perf | 10 | 568.800 | 232 | 1086 | 8733 | 150.088 | | tile2,4 | 5 | perf | 11 | 805.636 | 391 | 1458 | 10623 | 123.385 | | tile2,4 | 6 | extra | 11 | 688.000 | 362 | 1114 | 11390 | 115.076 | | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 20 | 555.900 | 200 | 1086 | 15078 | 86.929 | | tile2,1 | 2 | perf | 15 | 775.000 | 352 | 1676 | 14885 | 88.056 | | tile2,1 | 3 | perf | 16 | 717.062 | 257 | 1257 | 15606 | 83.988 | | tile2,1 | 4 | perf | 21 | 625.190 | 219 | 1626 | 16107 | 81.376 | | tile2,1 | 5 | perf | 19 | 679.000 | 283 | 1256 | 15806 | 82.925 | | tile2,1 | 6 | extra | 16 | 604.312 | 223 | 1086 | 15959 | 82.130 | | tile2,3 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,3 | 1 | perf | 11 | 587.727 | 260 | 1129 | 9544 | 137.334 | | tile2,3 | 2 | perf | 9 | 587.889 | 258 | 1086 | 9670 | 135.545 | | tile2,3 | 3 | perf | 12 | 492.667 | 182 | 1060 | 11210 | 116.924 | | tile2,3 | 4 | perf | 15 | 561.467 | 197 | 1086 | 12272 | 106.806 | | tile2,3 | 5 | perf | 12 | 607.417 | 234 | 1446 | 12376 | 105.908 | | tile2,3 | 6 | extra | 12 | 567.583 | 325 | 1086 | 11788 | 111.191 | | tile2,2 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,2 | 1 | perf | 15 | 552.133 | 286 | 1086 | 12506 | 104.807 | | tile2,2 | 2 | perf | 17 | 527.647 | 198 | 1178 | 11927 | 109.895 | | tile2,2 | 3 | perf | 11 | 531.091 | 175 | 977 | 9092 | 144.162 | | tile2,2 | 4 | perf | 16 | 661.062 | 386 | 1472 | 13842 | 94.692 | | tile2,2 | 5 | perf | 14 | 641.857 | 314 | 1086 | 12742 | 102.866 | | tile2,2 | 6 | extra | 18 | 578.778 | 187 | 1086 | 14095 | 92.992 |
[whole_array] M=256 K=256 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,4 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,4 | 1 | perf | 48 | 806.021 | 361 | 1086 | 46186 | 56.758 | | tile3,4 | 2 | perf | 45 | 815.089 | 477 | 1086 | 44800 | 58.514 | | tile3,4 | 3 | perf | 48 | 794.604 | 498 | 1086 | 45630 | 57.450 | | tile3,4 | 4 | perf | 40 | 792.775 | 422 | 901 | 38214 | 68.599 | | tile3,3 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,3 | 1 | perf | 48 | 801.167 | 351 | 1086 | 45314 | 57.851 | | tile3,3 | 2 | perf | 44 | 803.795 | 596 | 1086 | 42895 | 61.113 | | tile3,3 | 3 | perf | 47 | 804.447 | 474 | 1086 | 44426 | 59.007 | | tile3,3 | 4 | perf | 39 | 787.385 | 475 | 910 | 36537 | 71.748 | | tile3,2 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,2 | 1 | perf | 51 | 783.451 | 360 | 1556 | 47014 | 55.759 | | tile3,2 | 2 | perf | 52 | 765.538 | 378 | 1469 | 46452 | 56.433 | | tile3,2 | 3 | perf | 57 | 744.684 | 315 | 1665 | 48336 | 54.234 | | tile3,2 | 4 | perf | 46 | 746.761 | 334 | 983 | 40344 | 64.977 | | tile3,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile3,1 | 1 | perf | 54 | 961.278 | 543 | 1367 | 57113 | 45.899 | | tile3,1 | 2 | perf | 50 | 955.240 | 347 | 1233 | 55421 | 47.300 | | tile3,1 | 3 | perf | 52 | 938.538 | 406 | 1086 | 56732 | 46.207 | | tile3,1 | 4 | perf | 44 | 941.591 | 400 | 1204 | 47789 | 54.854 | | tile2,3 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,3 | 1 | perf | 41 | 685.195 | 237 | 1233 | 33165 | 79.042 | | tile2,3 | 2 | perf | 37 | 683.919 | 300 | 1226 | 31564 | 83.052 | | tile2,3 | 3 | perf | 41 | 711.390 | 289 | 1634 | 35359 | 74.138 | | tile2,3 | 4 | perf | 34 | 712.647 | 244 | 1438 | 28164 | 93.078 | | tile2,2 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,2 | 1 | perf | 48 | 678.833 | 393 | 1086 | 38340 | 68.374 | | tile2,2 | 2 | perf | 43 | 744.581 | 402 | 1795 | 37012 | 70.827 | | tile2,2 | 3 | perf | 46 | 701.087 | 302 | 1149 | 38334 | 68.384 | | tile2,2 | 4 | perf | 41 | 650.098 | 250 | 939 | 32179 | 81.464 | | tile2,4 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,4 | 1 | perf | 40 | 640.800 | 301 | 1394 | 32332 | 81.079 | | tile2,4 | 2 | perf | 37 | 723.784 | 334 | 1664 | 31202 | 84.015 | | tile2,4 | 3 | perf | 41 | 681.024 | 382 | 1319 | 33868 | 77.402 | | tile2,4 | 4 | perf | 35 | 653.171 | 358 | 1402 | 26852 | 97.626 | | tile2,1 | 0 | warmup | 0 | 0.000 | 0 | 0 | 0 | 0.000 | | tile2,1 | 1 | perf | 46 | 683.500 | 231 | 1561 | 39014 | 67.192 | | tile2,1 | 2 | perf | 46 | 679.543 | 337 | 1274 | 36848 | 71.142 | | tile2,1 | 3 | perf | 50 | 681.880 | 250 | 1086 | 40188 | 65.229 | | tile2,1 | 4 | perf | 42 | 629.905 | 233 | 1018 | 31403 | 83.477 |
[whole_array] M=256 K=64 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 8 | 934.250 | 565 | 1400 | 11405 | 57.463 | | tile3,1 | 1 | perf | 6 | 1162.000 | 788 | 1504 | 8315 | 78.817 | | tile3,1 | 2 | perf | 6 | 1151.500 | 873 | 1542 | 12344 | 53.091 | | tile3,1 | 3 | perf | 6 | 1052.333 | 807 | 1841 | 13305 | 49.257 | | tile3,1 | 4 | perf | 7 | 1243.286 | 953 | 2309 | 13958 | 46.952 | | tile3,1 | 5 | perf | 5 | 1096.000 | 682 | 1662 | 12160 | 53.895 | | tile3,2 | 0 | warmup | 6 | 984.500 | 697 | 1428 | 7785 | 84.182 | | tile3,2 | 1 | perf | 6 | 1183.000 | 887 | 1861 | 17287 | 37.911 | | tile3,2 | 2 | perf | 3 | 1690.000 | 612 | 2611 | 7197 | 91.060 | | tile3,2 | 3 | perf | 5 | 1594.400 | 626 | 3037 | 10638 | 61.606 | | tile3,2 | 4 | perf | 4 | 1335.750 | 761 | 1910 | 11551 | 56.736 | | tile3,2 | 5 | perf | 1 | 791.000 | 791 | 791 | 791 | 828.521 | | tile3,3 | 0 | warmup | 6 | 691.500 | 323 | 934 | 6466 | 101.355 | | tile3,3 | 1 | perf | 7 | 946.286 | 511 | 1550 | 15949 | 41.091 | | tile3,3 | 2 | perf | 4 | 1116.750 | 794 | 1662 | 6245 | 104.942 | | tile3,3 | 3 | perf | 5 | 1069.400 | 487 | 2393 | 8672 | 75.572 | | tile3,3 | 4 | perf | 5 | 1468.800 | 732 | 2703 | 8794 | 74.524 | | tile3,3 | 5 | perf | 2 | 837.500 | 745 | 930 | 1707 | 383.925 | | tile3,4 | 0 | warmup | 5 | 909.000 | 753 | 1225 | 6803 | 96.334 | | tile3,4 | 1 | perf | 7 | 1000.429 | 530 | 1620 | 16800 | 39.010 | | tile3,4 | 2 | perf | 3 | 1429.000 | 764 | 2416 | 6598 | 99.327 | | tile3,4 | 3 | perf | 5 | 903.000 | 539 | 1601 | 8993 | 72.874 | | tile3,4 | 4 | perf | 4 | 1092.000 | 572 | 1925 | 9461 | 69.270 | | tile3,4 | 5 | perf | 3 | 1209.000 | 824 | 1914 | 8011 | 81.808 | | tile2,3 | 0 | warmup | 3 | 908.667 | 416 | 1180 | 4787 | 136.904 | | tile2,3 | 1 | perf | 5 | 1380.400 | 1059 | 2440 | 16973 | 38.612 | | tile2,3 | 2 | perf | 3 | 1875.667 | 1173 | 3039 | 6346 | 103.271 | | tile2,3 | 3 | perf | 3 | 1663.667 | 1172 | 2531 | 6312 | 103.828 | | tile2,3 | 4 | perf | 3 | 1579.000 | 1179 | 1845 | 5678 | 115.421 | | tile2,3 | 5 | perf | 4 | 1351.250 | 1078 | 1849 | 5501 | 119.135 | | tile2,2 | 0 | warmup | 3 | 1506.333 | 1180 | 1777 | 6006 | 109.118 | | tile2,2 | 1 | perf | 5 | 1241.400 | 1049 | 1581 | 16618 | 39.437 | | tile2,2 | 2 | perf | 3 | 1857.667 | 1357 | 2679 | 5637 | 116.260 | | tile2,2 | 3 | perf | 4 | 1728.250 | 1172 | 2957 | 7877 | 83.199 | | tile2,2 | 4 | perf | 3 | 1705.667 | 1179 | 2724 | 5181 | 126.493 | | tile2,2 | 5 | perf | 2 | 2223.500 | 1179 | 3268 | 4479 | 146.318 | | tile2,1 | 0 | warmup | 4 | 619.500 | 375 | 872 | 3068 | 213.611 | | tile2,1 | 1 | perf | 6 | 669.833 | 269 | 1204 | 7284 | 89.973 | | tile2,1 | 2 | perf | 4 | 1131.750 | 684 | 1508 | 6857 | 95.575 | | tile2,1 | 3 | perf | 3 | 1210.667 | 620 | 2135 | 7467 | 87.768 | | tile2,1 | 4 | perf | 3 | 828.667 | 730 | 1007 | 7815 | 83.859 | | tile2,1 | 5 | perf | 5 | 873.800 | 446 | 1834 | 7098 | 92.330 | | tile2,4 | 0 | warmup | 1 | 1147.000 | 1147 | 1147 | 1147 | 571.369 | | tile2,4 | 1 | perf | 1 | 940.000 | 940 | 940 | 940 | 697.191 | | tile2,4 | 2 | perf | 1 | 1181.000 | 1181 | 1181 | 1181 | 554.920 | | tile2,4 | 3 | perf | 1 | 1209.000 | 1209 | 1209 | 1209 | 542.068 | | tile2,4 | 4 | perf | 1 | 1150.000 | 1150 | 1150 | 1150 | 569.878 | | tile2,4 | 5 | perf | 1 | 1127.000 | 1127 | 1127 | 1127 | 581.508 |
[whole_array] M=64 K=128 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,1 | 0 | warmup | 6 | 1081.167 | 961 | 1180 | 7665 | 42.750 | | tile3,1 | 1 | perf | 7 | 962.429 | 504 | 1172 | 7939 | 41.275 | | tile3,1 | 2 | perf | 5 | 1022.000 | 810 | 1180 | 5449 | 60.136 | | tile3,3 | 0 | warmup | 4 | 1298.000 | 1133 | 1588 | 11409 | 28.721 | | tile3,3 | 1 | perf | 5 | 1277.400 | 938 | 2058 | 17075 | 19.191 | | tile3,3 | 2 | perf | 5 | 893.600 | 530 | 1180 | 10140 | 32.316 | | tile3,4 | 0 | warmup | 4 | 1528.000 | 1150 | 2460 | 6521 | 50.250 | | tile3,4 | 1 | perf | 3 | 1172.333 | 598 | 1748 | 8149 | 40.211 | | tile3,4 | 2 | perf | 4 | 1195.000 | 977 | 1490 | 5207 | 62.931 | | tile3,2 | 0 | warmup | 6 | 949.667 | 430 | 1508 | 7450 | 43.984 | | tile3,2 | 1 | perf | 2 | 1470.000 | 1172 | 1768 | 8785 | 37.300 | | tile3,2 | 2 | perf | 3 | 1416.333 | 1123 | 1945 | 5822 | 56.283 | | tile2,3 | 0 | warmup | 4 | 1261.000 | 868 | 1828 | 10604 | 30.902 | | tile2,3 | 1 | perf | 5 | 1043.600 | 560 | 1234 | 16223 | 20.198 | | tile2,3 | 2 | perf | 2 | 1172.500 | 1139 | 1206 | 2377 | 137.854 | | tile2,2 | 0 | warmup | 4 | 1247.750 | 1138 | 1385 | 7190 | 45.574 | | tile2,2 | 1 | perf | 5 | 977.600 | 581 | 1198 | 12625 | 25.955 | | tile2,2 | 2 | perf | 4 | 1081.250 | 819 | 1206 | 5913 | 55.417 | | tile2,1 | 0 | warmup | 6 | 958.833 | 593 | 1298 | 5968 | 54.906 | | tile2,1 | 1 | perf | 6 | 923.500 | 644 | 1198 | 6174 | 53.074 | | tile2,1 | 2 | perf | 3 | 1095.667 | 1016 | 1185 | 3351 | 97.786 | | tile2,4 | 0 | warmup | 3 | 491635.000 | 319998 | 833937 | 8013468 | 0.041 |
[whole_array] M=64 K=256 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,3 | 0 | warmup | 6 | 1384.500 | 1181 | 2077 | 9606 | 68.224 | | tile3,3 | 1 | perf | 6 | 848.167 | 634 | 1125 | 12850 | 51.001 | | tile3,3 | 2 | perf | 6 | 1584.333 | 608 | 3314 | 10261 | 63.869 | | tile3,3 | 3 | perf | 6 | 1450.667 | 618 | 2611 | 11445 | 57.262 | | tile3,3 | 4 | perf | 7 | 1592.857 | 1104 | 3355 | 11430 | 57.337 | | tile3,2 | 0 | warmup | 12 | 869.000 | 624 | 1796 | 11528 | 56.849 | | tile3,2 | 1 | perf | 6 | 985.500 | 625 | 1129 | 12088 | 54.216 | | tile3,2 | 2 | perf | 13 | 849.846 | 358 | 1274 | 12071 | 54.292 | | tile3,2 | 3 | perf | 12 | 792.583 | 488 | 1529 | 10048 | 65.223 | | tile3,2 | 4 | perf | 13 | 703.923 | 445 | 1086 | 9739 | 67.292 | | tile3,1 | 0 | warmup | 15 | 937.533 | 359 | 1086 | 15066 | 43.499 | | tile3,1 | 1 | perf | 11 | 1002.000 | 767 | 1177 | 12143 | 53.970 | | tile3,1 | 2 | perf | 15 | 1077.000 | 810 | 1382 | 17799 | 36.820 | | tile3,1 | 3 | perf | 15 | 957.800 | 559 | 1086 | 15299 | 42.837 | | tile3,1 | 4 | perf | 15 | 996.200 | 799 | 1178 | 15595 | 42.024 | | tile3,4 | 0 | warmup | 5 | 1508.400 | 640 | 3925 | 11046 | 59.330 | | tile3,4 | 1 | perf | 4 | 1440.000 | 813 | 2426 | 11785 | 55.610 | | tile3,4 | 2 | perf | 4 | 1476.750 | 587 | 3362 | 9773 | 67.058 | | tile3,4 | 3 | perf | 5 | 1900.200 | 1175 | 3715 | 10887 | 60.197 | | tile3,4 | 4 | perf | 2 | 1508.500 | 1472 | 1545 | 3049 | 214.943 | | tile2,2 | 0 | warmup | 12 | 728.250 | 525 | 1581 | 9736 | 67.313 | | tile2,2 | 1 | perf | 11 | 704.818 | 410 | 1135 | 11650 | 56.254 | | tile2,2 | 2 | perf | 14 | 793.786 | 522 | 1138 | 12143 | 53.970 | | tile2,2 | 3 | perf | 11 | 789.818 | 492 | 1086 | 9592 | 68.324 | | tile2,2 | 4 | perf | 12 | 683.167 | 405 | 1086 | 9241 | 70.919 | | tile2,3 | 0 | warmup | 10 | 675.700 | 317 | 1126 | 8259 | 79.351 | | tile2,3 | 1 | perf | 6 | 1043.500 | 602 | 1404 | 11667 | 56.172 | | tile2,3 | 2 | perf | 7 | 744.714 | 342 | 1200 | 9328 | 70.257 | | tile2,3 | 3 | perf | 8 | 957.750 | 363 | 2009 | 8682 | 75.485 | | tile2,3 | 4 | perf | 9 | 852.667 | 290 | 1864 | 9349 | 70.099 | | tile2,1 | 0 | warmup | 12 | 758.500 | 588 | 1086 | 10847 | 60.419 | | tile2,1 | 1 | perf | 11 | 786.818 | 430 | 1212 | 9558 | 68.567 | | tile2,1 | 2 | perf | 8 | 660.250 | 266 | 1200 | 11475 | 57.112 | | tile2,1 | 3 | perf | 13 | 748.000 | 574 | 1086 | 11008 | 59.535 | | tile2,1 | 4 | perf | 12 | 772.667 | 552 | 1413 | 10349 | 63.326 | | tile2,4 | 0 | warmup | 5 | 971.600 | 515 | 1443 | 7480 | 87.615 | | tile2,4 | 1 | perf | 5 | 974.800 | 424 | 1313 | 9675 | 67.737 | | tile2,4 | 2 | perf | 5 | 1006.600 | 728 | 1200 | 10577 | 61.961 | | tile2,4 | 3 | perf | 8 | 873.000 | 459 | 1198 | 7842 | 83.571 | | tile2,4 | 4 | perf | 68 | 557.897 | 227 | 1688 | 4145645 | 0.158 |
[whole_array] M=64 K=64 N=256 R=2 C=4 | Tile | Iter | Phase | Kernels | Avg cycles | Min | Max | Total span | GMAC/s | |------|------|-------|---------|------------|-----|-----|------------|--------| | tile3,4 | 0 | warmup | 4 | 867.500 | 517 | 1181 | 3831 | 42.767 | | tile3,4 | 1 | perf | 4 | 990.500 | 690 | 1176 | 4427 | 37.009 | | tile3,4 | 2 | perf | 2 | 1155.500 | 1137 | 1174 | 2343 | 69.927 | | tile3,4 | 3 | perf | 3 | 1169.667 | 1086 | 1275 | 3808 | 43.025 | | tile3,4 | 4 | perf | 4 | 1163.750 | 1086 | 1256 | 5115 | 32.031 | | tile3,4 | 5 | perf | 2 | 1172.000 | 1163 | 1181 | 2376 | 68.956 | | tile3,3 | 0 | warmup | 3 | 1255.667 | 1133 | 1454 | 5820 | 28.151 | | tile3,3 | 1 | perf | 3 | 1411.667 | 1135 | 1924 | 6683 | 24.516 | | tile3,2 | 0 | warmup | 3 | 1278.000 | 1135 | 1518 | 4907 | 33.389 | | tile3,2 | 1 | perf | 3 | 1384.667 | 978 | 2001 | 5720 | 28.643 | | tile3,1 | 0 | warmup | 2 | 1072.000 | 1059 | 1085 | 2176 | 75.294 | | tile3,1 | 1 | perf | 4 | 1039.500 | 848 | 1176 | 4436 | 36.934 | | tile3,1 | 2 | perf | 1 | 1174.000 | 1174 | 1174 | 1174 | 139.557 | | tile3,1 | 3 | perf | 3 | 1156.000 | 1086 | 1242 | 3767 | 43.493 | | tile3,1 | 4 | perf | 4 | 1164.250 | 1086 | 1237 | 4971 | 32.959 | | tile3,1 | 5 | perf | 1 | 1181.000 | 1181 | 1181 | 1181 | 138.730 | | tile2,3 | 0 | warmup | 3 | 1173.333 | 1138 | 1211 | 5542 | 29.563 | | tile2,3 | 1 | perf | 3 | 1332.333 | 1142 | 1646 | 6222 | 26.332 | | tile2,2 | 0 | warmup | 2 | 1001.500 | 962 | 1041 | 2035 | 80.511 | | tile2,2 | 1 | perf | 3 | 1267.667 | 856 | 1738 | 5324 | 30.774 | | tile2,4 | 0 | warmup | 2 | 1186.500 | 1162 | 1211 | 2405 | 68.125 | | tile2,4 | 1 | perf | 3 | 962.667 | 868 | 1026 | 3266 | 50.165 | | tile2,1 | 0 | warmup | 2 | 862.000 | 804 | 920 | 1756 | 93.303 | | tile2,1 | 1 | perf | 3 | 929.000 | 733 | 1086 | 3438 | 47.656 | | tile2,1 | 2 | perf | 1 | 1196.000 | 1196 | 1196 | 1196 | 136.990 | | tile2,1 | 3 | perf | 3 | 1087.333 | 1083 | 1093 | 3463 | 47.312 | | tile2,1 | 4 | perf | 3 | 1100.000 | 1010 | 1204 | 4636 | 35.341 | | tile2,1 | 5 | perf | 1 | 1211.000 | 1211 | 1211 | 1211 | 135.293 |