|
| 1 | +== "Zvzip" Extension for Reordering Structured Data, Version 0.1 |
| 2 | + |
| 3 | +This chapter describes the Zvzip standard extension for reordering structured |
| 4 | +data in vector registers. These instructions address usages such as packing and |
| 5 | +unpacking data structures such as color components of a pixel, real and |
| 6 | +imaginary components of complex numbers, transposing small matrices, among |
| 7 | +others. |
| 8 | + |
| 9 | +[%autowidth] |
| 10 | +[%header,cols="2,4"] |
| 11 | +|=== |
| 12 | +|Mnemonic |Instruction |
| 13 | +| vzip.vv | <<insns-vzip>> |
| 14 | +| vunzipe.v | <<insns-vunzipe>> |
| 15 | +| vunzipo.v | <<insns-vunzipo>> |
| 16 | +| vpaire.vv | <<insns-vpaire>> |
| 17 | +| vpairo.vv | <<insns-vpairo>> |
| 18 | +|=== |
| 19 | + |
| 20 | +The Zvzip Extension depends on the Zve32x Extension. |
| 21 | + |
| 22 | +<<< |
| 23 | + |
| 24 | +[[insns-vzip, Vector Zip]] |
| 25 | +=== Vector Zip Instruction |
| 26 | + |
| 27 | +Synopsis:: |
| 28 | + |
| 29 | +Interleave elements from source vector register groups into destination vector |
| 30 | +register groups. |
| 31 | + |
| 32 | +Mnemonic:: |
| 33 | + |
| 34 | +vzip.vv vd, vs2, vs1, vm |
| 35 | + |
| 36 | +Encoding:: |
| 37 | + |
| 38 | +[wavedrom, , svg] |
| 39 | +.... |
| 40 | +{reg:[ |
| 41 | + {bits: 7, name: 'OP-V'}, |
| 42 | + {bits: 5, name: 'vd'}, |
| 43 | + {bits: 3, name: 'OPMVV'}, |
| 44 | + {bits: 5, name: 'vs1'}, |
| 45 | + {bits: 5, name: 'vs2'}, |
| 46 | + {bits: 1, name: 'vm'}, |
| 47 | + {bits: 6, name: '111110'}, |
| 48 | +], config:{lanes: 1, hspace:1024}} |
| 49 | +.... |
| 50 | + |
| 51 | +Description:: |
| 52 | + |
| 53 | +Vector Zip (VZIP) instruction interleaves elements from two source vector |
| 54 | +register groups (`vs2` and `vs1`) into one destination vector register group |
| 55 | +(`vd`) by alternating elements from the two sources. + |
| 56 | + + |
| 57 | +For destination element index `i`, if `i` is even then `vd[i] = vs2[i/2]`, and |
| 58 | +if `i` is odd then `vd[i] = vs1[i/2]`. + |
| 59 | + + |
| 60 | +Equivalently, the result order is: |
| 61 | +`vd = [vs2[0], vs1[0], vs2[1], vs1[1], ... ]` + |
| 62 | + + |
| 63 | +This instruction operates with an effective vector length (EVL) of 2*VL. The |
| 64 | +destination EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. |
| 65 | +Prestart, inactive, and tail element handling follows the standard vector |
| 66 | +rules, applied over the EVL. + |
| 67 | + + |
| 68 | +The destination vector register group may overlap the source vector register |
| 69 | +group if the overlap is in the highest-numbered part of the destination |
| 70 | +register group and the source EMUL is at least 1. If the overlap violates these |
| 71 | +constraints, the instruction encoding is reserved. |
| 72 | + |
| 73 | +Operation:: |
| 74 | + |
| 75 | +[source,sail] |
| 76 | +-- |
| 77 | +function clause execute (VZIP(vs2, vs1, vd, vm)) = { |
| 78 | + EVL = 2 * VL; |
| 79 | + foreach (i from vstart to EVL-1) { |
| 80 | + let j = i / 2; |
| 81 | + let op1 = get_velem(vs1, SEW, j); |
| 82 | + let op2 = get_velem(vs2, SEW, j); |
| 83 | + let res = if (i % 2 == 0) then op2 else op1; |
| 84 | + if (vm == 0b1) | (v0[i] == 0b1) then |
| 85 | + set_velem(vd, EEW=SEW, i, res); |
| 86 | + // inactive element handling follows VMA |
| 87 | + } |
| 88 | + // tail element handling follows VTA |
| 89 | + RETIRE_SUCCESS |
| 90 | +} |
| 91 | +-- |
| 92 | + |
| 93 | +<<< |
| 94 | + |
| 95 | +[[insns-vunzipe, Vector Unzip Even]] |
| 96 | +=== Vector Unzip Even Instruction |
| 97 | + |
| 98 | +Synopsis:: |
| 99 | + |
| 100 | +Extract even-indexed elements from source vector register group into the |
| 101 | +destination vector register group. |
| 102 | + |
| 103 | +Mnemonic:: |
| 104 | + |
| 105 | +vunzipe.v vd, vs2, vm |
| 106 | + |
| 107 | +Encoding:: |
| 108 | + |
| 109 | +[wavedrom, , svg] |
| 110 | +.... |
| 111 | +{reg:[ |
| 112 | + {bits: 7, name: 'OP-V'}, |
| 113 | + {bits: 5, name: 'vd'}, |
| 114 | + {bits: 3, name: 'OPMVV'}, |
| 115 | + {bits: 5, name: '01011'}, |
| 116 | + {bits: 5, name: 'vs2'}, |
| 117 | + {bits: 1, name: 'vm'}, |
| 118 | + {bits: 6, name: '010010'}, |
| 119 | +], config:{lanes: 1, hspace:1024}} |
| 120 | +.... |
| 121 | + |
| 122 | +Description:: |
| 123 | + |
| 124 | +The vector unzip-even instruction (VUNZIPE) extracts VL even-indexed elements |
| 125 | +from the source vector register group into the destination vector register |
| 126 | +group. + |
| 127 | + + |
| 128 | +This instruction accesses 2*VL elements in the source vector register group and |
| 129 | +the source EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. + |
| 130 | + + |
| 131 | +Prestart, inactive, and tail element handling follow the standard vector |
| 132 | +rules and are defined over the destination element indices (`0` to `VL-1`). + |
| 133 | + + |
| 134 | +The destination vector register group may overlap the source vector register |
| 135 | +group only if the overlap is in the lowest-numbered part of the source register |
| 136 | +group. If the overlap violates these constraints, the instruction encoding is |
| 137 | +reserved. |
| 138 | + |
| 139 | +Operation:: |
| 140 | + |
| 141 | +[source,sail] |
| 142 | +-- |
| 143 | +function clause execute (VUNZIPE(vs2, vd, vm)) = { |
| 144 | + foreach (i from vstart to VL-1) { |
| 145 | + let j = i * 2; |
| 146 | + if (vm == 0b1) | (v0[i] == 0b1) then |
| 147 | + set_velem(vd, EEW=SEW, i, get_velem(vs2, SEW, j)); |
| 148 | + // inactive element handling follows VMA |
| 149 | + } |
| 150 | + // tail element handling follows VTA |
| 151 | + RETIRE_SUCCESS |
| 152 | +} |
| 153 | +-- |
| 154 | + |
| 155 | +<<< |
| 156 | + |
| 157 | +[[insns-vunzipo, Vector Unzip Odd]] |
| 158 | +=== Vector Unzip Odd Instruction |
| 159 | + |
| 160 | +Synopsis:: |
| 161 | + |
| 162 | +Extract odd-indexed elements from source vector register group into the |
| 163 | +destination vector register group. |
| 164 | + |
| 165 | +Mnemonic:: |
| 166 | + |
| 167 | +vunzipo.v vd, vs2, vm |
| 168 | + |
| 169 | +Encoding:: |
| 170 | + |
| 171 | +[wavedrom, , svg] |
| 172 | +.... |
| 173 | +{reg:[ |
| 174 | + {bits: 7, name: 'OP-V'}, |
| 175 | + {bits: 5, name: 'vd'}, |
| 176 | + {bits: 3, name: 'OPMVV'}, |
| 177 | + {bits: 5, name: '01111'}, |
| 178 | + {bits: 5, name: 'vs2'}, |
| 179 | + {bits: 1, name: 'vm'}, |
| 180 | + {bits: 6, name: '010010'}, |
| 181 | +], config:{lanes: 1, hspace:1024}} |
| 182 | +.... |
| 183 | + |
| 184 | +Description:: |
| 185 | + |
| 186 | +The vector unzip-odd instruction (VUNZIPO) extracts VL odd-indexed elements |
| 187 | +from the source vector register group into the destination vector register |
| 188 | +group. + |
| 189 | + + |
| 190 | +This instruction accesses 2*VL elements in the source vector register group and |
| 191 | +the source EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. + |
| 192 | + + |
| 193 | +Prestart, inactive, and tail element handling follow the standard vector |
| 194 | +rules and are defined over the destination element indices (`0` to `VL-1`). + |
| 195 | + + |
| 196 | +The destination vector register group may overlap the source vector register |
| 197 | +group only if the overlap is in the lowest-numbered part of the source register |
| 198 | +group. If the overlap violates these constraints, the instruction encoding is |
| 199 | +reserved. |
| 200 | + |
| 201 | +Operation:: |
| 202 | + |
| 203 | +[source,sail] |
| 204 | +-- |
| 205 | +function clause execute (VUNZIPO(vs2, vd, vm)) = { |
| 206 | + foreach (i from vstart to VL-1) { |
| 207 | + let j = (i * 2) + 1; |
| 208 | + if (vm == 0b1) | (v0[i] == 0b1) then |
| 209 | + set_velem(vd, EEW=SEW, i, get_velem(vs2, SEW, j)); |
| 210 | + // inactive element handling follows VMA |
| 211 | + } |
| 212 | + // tail element handling follows VTA |
| 213 | + RETIRE_SUCCESS |
| 214 | +} |
| 215 | +-- |
| 216 | + |
| 217 | +<<< |
| 218 | + |
| 219 | +[[insns-vpaire, Vector Pair Even]] |
| 220 | +=== Vector Pair Even Instruction |
| 221 | + |
| 222 | +Synopsis:: |
| 223 | + |
| 224 | +Interleave the even-indexed elements of the source vector register groups into |
| 225 | +the destination vector register group. |
| 226 | + |
| 227 | +Mnemonic:: |
| 228 | + |
| 229 | +vpaire.vv vd, vs2, vs1, vm |
| 230 | + |
| 231 | + |
| 232 | +Encoding:: |
| 233 | + |
| 234 | +[wavedrom, , svg] |
| 235 | +.... |
| 236 | +{reg:[ |
| 237 | + {bits: 7, name: 'OP-V'}, |
| 238 | + {bits: 5, name: 'vd'}, |
| 239 | + {bits: 3, name: 'OPIVV'}, |
| 240 | + {bits: 5, name: 'vs1'}, |
| 241 | + {bits: 5, name: 'vs2'}, |
| 242 | + {bits: 1, name: 'vm'}, |
| 243 | + {bits: 6, name: '001111'}, |
| 244 | +], config:{lanes: 1, hspace:1024}} |
| 245 | +.... |
| 246 | + |
| 247 | +Description:: |
| 248 | + |
| 249 | +The vector pair-even instruction (VPAIRE) interleaves the even-indexed |
| 250 | +elements of the source vector register groups into the destination vector |
| 251 | +register group. + |
| 252 | + + |
| 253 | +For destination element index `i`, if `i` is even then `vd[i] = vs2[i]`, and if |
| 254 | +`i` is odd then `vd[i] = vs1[i - 1]`. + |
| 255 | + + |
| 256 | +Equivalently, the result order is: |
| 257 | +`vd = [vs2[0], vs1[0], vs2[2], vs1[2], ... ]` + |
| 258 | + + |
| 259 | +The destination vector register group cannot overlap the source vector register |
| 260 | +groups and, if masked, cannot overlap the mask register. If the overlap |
| 261 | +violates these constraints, the instruction encoding is reserved. + |
| 262 | + + |
| 263 | +Prestart, inactive, and tail element handling follow the standard vector rules. |
| 264 | + |
| 265 | +Operation:: |
| 266 | + |
| 267 | +[source,sail] |
| 268 | +-- |
| 269 | +function clause execute (VPAIRE(vs2, vs1, vd, vm)) = { |
| 270 | + foreach (i from vstart to VL-1) { |
| 271 | + let j = if (i % 2) == 0 then i else (i - 1); |
| 272 | + let res = if (i % 2) == 0 |
| 273 | + then get_velem(vs2, SEW, j) |
| 274 | + else get_velem(vs1, SEW, j); |
| 275 | + if (vm == 0b1) | (v0[i] == 0b1) then |
| 276 | + set_velem(vd, EEW=SEW, i, res); |
| 277 | + // inactive element handling follows VMA |
| 278 | + } |
| 279 | + // tail element handling follows VTA |
| 280 | + RETIRE_SUCCESS |
| 281 | +} |
| 282 | +-- |
| 283 | + |
| 284 | +<<< |
| 285 | + |
| 286 | +[[insns-vpairo, Vector Pair Odd]] |
| 287 | +=== Vector Pair Odd Instruction |
| 288 | + |
| 289 | +Synopsis:: |
| 290 | + |
| 291 | +Interleave the odd-indexed elements of the source vector register groups into |
| 292 | +the destination vector register group. |
| 293 | + |
| 294 | +Mnemonic:: |
| 295 | + |
| 296 | +vpairo.vv vd, vs2, vs1, vm |
| 297 | + |
| 298 | + |
| 299 | +Encoding:: |
| 300 | + |
| 301 | +[wavedrom, , svg] |
| 302 | +.... |
| 303 | +{reg:[ |
| 304 | + {bits: 7, name: 'OP-V'}, |
| 305 | + {bits: 5, name: 'vd'}, |
| 306 | + {bits: 3, name: 'OPMVV'}, |
| 307 | + {bits: 5, name: 'vs1'}, |
| 308 | + {bits: 5, name: 'vs2'}, |
| 309 | + {bits: 1, name: 'vm'}, |
| 310 | + {bits: 6, name: '001111'}, |
| 311 | +], config:{lanes: 1, hspace:1024}} |
| 312 | +.... |
| 313 | + |
| 314 | +Description:: |
| 315 | + |
| 316 | +The vector pair-odd instruction (VPAIRO) interleaves the odd-indexed |
| 317 | +elements of the source vector register groups into the destination vector |
| 318 | +register group. + |
| 319 | + + |
| 320 | +For destination element index `i`, if `i` is even then `vd[i] = vs2[i + 1]`, and |
| 321 | +if `i` is odd then `vd[i] = vs1[i]`. + |
| 322 | + + |
| 323 | +Equivalently, the result order is: |
| 324 | +`vd = [vs2[1], vs1[1], vs2[3], vs1[3], ... ]` + |
| 325 | + + |
| 326 | +The destination vector register group cannot overlap the source vector register |
| 327 | +groups and, if masked, cannot overlap the mask register. If the overlap |
| 328 | +violates these constraints, the instruction encoding is reserved. + |
| 329 | + + |
| 330 | +Prestart, inactive, and tail element handling follow the standard vector rules. |
| 331 | + + |
| 332 | +VPAIRO may read one element past `VL` in `vs2` when `VL` is odd. If the element |
| 333 | +index is greater than or equal to VLMAX in the source vector register group, |
| 334 | +the value 0 is returned for that element. |
| 335 | + |
| 336 | +Operation:: |
| 337 | + |
| 338 | +[source,sail] |
| 339 | +-- |
| 340 | +function clause execute (VPAIRO(vs2, vs1, vd, vm)) = { |
| 341 | + foreach (i from vstart to VL-1) { |
| 342 | + let j = if (i % 2) == 0 then (i + 1) else i; |
| 343 | + let res = |
| 344 | + if (j >= vlmax) then zeros() |
| 345 | + else if (i % 2) == 0 then get_velem(vs2, SEW, j) |
| 346 | + else get_velem(vs1, SEW, j); |
| 347 | + if (vm == 0b1) | (v0[i] == 0b1) then |
| 348 | + set_velem(vd, EEW=SEW, i, res); |
| 349 | + // inactive element handling follows VMA |
| 350 | + } |
| 351 | + // tail element handling follows VTA |
| 352 | + RETIRE_SUCCESS |
| 353 | +} |
| 354 | +-- |
| 355 | + |
| 356 | +<<< |
| 357 | + |
| 358 | +[NOTE] |
| 359 | +==== |
| 360 | +
|
| 361 | +The following example illustrates use of the vector pair-even and pair-odd to |
| 362 | +transpose VL/4 independent 4x4 matrices packed across vector registers. |
| 363 | +
|
| 364 | +The first stage operates on 32-bit elements. The second stage packs adjacent |
| 365 | +pairs into 64-bit elements to complete the transpose. |
| 366 | +
|
| 367 | +---- |
| 368 | +vsetivli t0, zero, e32, m1, ta, ma |
| 369 | +vpaire.vv v5, v1, v2 #|a|b|c|d|A|B|C|D|.. |a|e|c|g|A|E|C|G|.. |
| 370 | +vpairo.vv v6, v1, v2 #|e|f|g|h|E|F|G|H|.. -> |b|f|d|h|B|F|D|H|.. |
| 371 | +vpaire.vv v7, v3, v4 #|i|j|k|l|I|J|K|L|.. |i|m|k|o|I|M|K|O|.. |
| 372 | +vpairo.vv v8, v3, v4 #|m|n|o|p|M|N|O|P|.. |j|n|l|p|J|N|L|P|.. |
| 373 | +
|
| 374 | +vsetivli t0, zero, e64, m1, ta, ma |
| 375 | +vpaire.vv v1, v5, v7 #|a e|c g|A E|C G|.. |a e|i m|A E|I M|.. |
| 376 | +vpaire.vv v2, v6, v8 #|b f|d h|B F|D H|.. -> |b f|j n|B F|J N|.. |
| 377 | +vpairo.vv v3, v5, v7 #|i m|k o|I M|K O|.. |c g|k o|C G|K O|.. |
| 378 | +vpairo.vv v4, v6, v8 #|j n|l p|J N|L P|.. |d h|l p|D H|L P|.. |
| 379 | +---- |
| 380 | +
|
| 381 | +==== |
0 commit comments