Skip to content

Commit c2cba12

Browse files
committed
Add Zvzip extension for reordering structured data
1 parent 0d4153c commit c2cba12

File tree

3 files changed

+381
-0
lines changed

3 files changed

+381
-0
lines changed

src/colophon.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ h|Extension h|Version h|Status
6262
|*Zclsd* |*1.0* |*Ratified*
6363
|*B* |*1.0* |*Ratified*
6464
|*V* |*1.0* |*Ratified*
65+
|*Zvzip* |*0.1* |_Draft_
6566
|*Zbkb* |*1.0* |*Ratified*
6667
|*Zbkc* |*1.0* |*Ratified*
6768
|*Zbkx* |*1.0* |*Ratified*

src/riscv-unprivileged.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ include::c-st-ext.adoc[]
183183
include::zc.adoc[]
184184
include::b-st-ext.adoc[]
185185
include::v-st-ext.adoc[]
186+
include::zvzip.adoc[]
186187
include::scalar-crypto.adoc[]
187188
include::vector-crypto.adoc[]
188189
include::unpriv-cfi.adoc[]

src/zvzip.adoc

Lines changed: 379 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,379 @@
1+
== "Zvzip" Extension for Reordering Structured Data, Version 0.1
2+
3+
This chapter describes the Zvzip standard extension for reordering structured
4+
data in vector registers. These instructions address usages such as packing and
5+
unpacking data structures such as color components of a pixel, real and
6+
imaginary components of complex numbers, transposing small matrices, among
7+
others.
8+
9+
[%autowidth]
10+
[%header,cols="2,4"]
11+
|===
12+
|Mnemonic |Instruction
13+
| vzip.vv | <<insns-vzip>>
14+
| vunzipe.v | <<insns-vunzipe>>
15+
| vunzipo.v | <<insns-vunzipo>>
16+
| vpaire.vv | <<insns-vpaire>>
17+
| vpairo.vv | <<insns-vpairo>>
18+
|===
19+
20+
<<<
21+
22+
[[insns-vzip, Vector Zip]]
23+
=== Vector Zip Instruction
24+
25+
Synopsis::
26+
27+
Interleave elements from source vector register groups into destination vector
28+
register groups.
29+
30+
Mnemonic::
31+
32+
vzip.vv vd, vs2, vs1, vm
33+
34+
Encoding::
35+
36+
[wavedrom, , svg]
37+
....
38+
{reg:[
39+
{bits: 7, name: 'OP-V'},
40+
{bits: 5, name: 'vd'},
41+
{bits: 3, name: 'OPMVV'},
42+
{bits: 5, name: 'vs1'},
43+
{bits: 5, name: 'vs2'},
44+
{bits: 1, name: 'vm'},
45+
{bits: 6, name: '111110'},
46+
], config:{lanes: 1, hspace:1024}}
47+
....
48+
49+
Description::
50+
51+
Vector Zip (VZIP) instruction interleaves elements from two source vector
52+
register groups (`vs2` and `vs1`) into one destination vector register group
53+
(`vd`) by alternating elements from the two sources. +
54+
+
55+
For destination element index `i`, if `i` is even then `vd[i] = vs2[i/2]`, and
56+
if `i` is odd then `vd[i] = vs1[i/2]`. +
57+
+
58+
Equivalently, the result order is:
59+
`vd = [vs2[0], vs1[0], vs2[1], vs1[1], ... ]` +
60+
+
61+
This instruction operates with an effective vector length (EVL) of 2*VL. The
62+
destination EMUL is 2xLMUL. The instruction is reserved when LMUL is 8.
63+
Prestart, inactive, and tail element handling follows the standard vector
64+
rules, applied over the EVL. +
65+
+
66+
The destination vector register group may overlap the source vector register
67+
group if the overlap is in the highest-numbered part of the destination
68+
register group and the source EMUL is at least 1. If the overlap violates these
69+
constraints, the instruction encoding is reserved.
70+
71+
Operation::
72+
73+
[source,sail]
74+
--
75+
function clause execute (VZIP(vs2, vs1, vd, vm)) = {
76+
EVL = 2 * VL;
77+
foreach (i from vstart to EVL-1) {
78+
let j = i / 2;
79+
let op1 = get_velem(vs1, SEW, j);
80+
let op2 = get_velem(vs2, SEW, j);
81+
let res = if (i % 2 == 0) then op2 else op1;
82+
if (vm == 0b1) | (v0[i] == 0b1) then
83+
set_velem(vd, EEW=SEW, i, res);
84+
// inactive element handling follows VMA
85+
}
86+
// tail element handling follows VTA
87+
RETIRE_SUCCESS
88+
}
89+
--
90+
91+
<<<
92+
93+
[[insns-vunzipe, Vector Unzip Even]]
94+
=== Vector Unzip Even Instruction
95+
96+
Synopsis::
97+
98+
Extract even-indexed elements from source vector register group into the
99+
destination vector register group.
100+
101+
Mnemonic::
102+
103+
vunzipe.v vd, vs2, vm
104+
105+
Encoding::
106+
107+
[wavedrom, , svg]
108+
....
109+
{reg:[
110+
{bits: 7, name: 'OP-V'},
111+
{bits: 5, name: 'vd'},
112+
{bits: 3, name: 'OPMVV'},
113+
{bits: 5, name: '01011'},
114+
{bits: 5, name: 'vs2'},
115+
{bits: 1, name: 'vm'},
116+
{bits: 6, name: '010010'},
117+
], config:{lanes: 1, hspace:1024}}
118+
....
119+
120+
Description::
121+
122+
The vector unzip-even instruction (VUNZIPE) extracts VL even-indexed elements
123+
from the source vector register group into the destination vector register
124+
group. +
125+
+
126+
This instruction accesses 2*VL elements in the source vector register group and
127+
the source EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. +
128+
+
129+
Prestart, inactive, and tail element handling follow the standard vector
130+
rules and are defined over the destination element indices (`0` to `VL-1`). +
131+
+
132+
The destination vector register group may overlap the source vector register
133+
group only if the overlap is in the lowest-numbered part of the source register
134+
group. If the overlap violates these constraints, the instruction encoding is
135+
reserved.
136+
137+
Operation::
138+
139+
[source,sail]
140+
--
141+
function clause execute (VUNZIPE(vs2, vd, vm)) = {
142+
foreach (i from vstart to VL-1) {
143+
let j = i * 2;
144+
if (vm == 0b1) | (v0[i] == 0b1) then
145+
set_velem(vd, EEW=SEW, i, get_velem(vs2, SEW, j));
146+
// inactive element handling follows VMA
147+
}
148+
// tail element handling follows VTA
149+
RETIRE_SUCCESS
150+
}
151+
--
152+
153+
<<<
154+
155+
[[insns-vunzipo, Vector Unzip Odd]]
156+
=== Vector Unzip Odd Instruction
157+
158+
Synopsis::
159+
160+
Extract odd-indexed elements from source vector register group into the
161+
destination vector register group.
162+
163+
Mnemonic::
164+
165+
vunzipo.v vd, vs2, vm
166+
167+
Encoding::
168+
169+
[wavedrom, , svg]
170+
....
171+
{reg:[
172+
{bits: 7, name: 'OP-V'},
173+
{bits: 5, name: 'vd'},
174+
{bits: 3, name: 'OPMVV'},
175+
{bits: 5, name: '01111'},
176+
{bits: 5, name: 'vs2'},
177+
{bits: 1, name: 'vm'},
178+
{bits: 6, name: '010010'},
179+
], config:{lanes: 1, hspace:1024}}
180+
....
181+
182+
Description::
183+
184+
The vector unzip-odd instruction (VUNZIPO) extracts VL odd-indexed elements
185+
from the source vector register group into the destination vector register
186+
group. +
187+
+
188+
This instruction accesses 2*VL elements in the source vector register group and
189+
the source EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. +
190+
+
191+
Prestart, inactive, and tail element handling follow the standard vector
192+
rules and are defined over the destination element indices (`0` to `VL-1`). +
193+
+
194+
The destination vector register group may overlap the source vector register
195+
group only if the overlap is in the lowest-numbered part of the source register
196+
group. If the overlap violates these constraints, the instruction encoding is
197+
reserved.
198+
199+
Operation::
200+
201+
[source,sail]
202+
--
203+
function clause execute (VUNZIPO(vs2, vd, vm)) = {
204+
foreach (i from vstart to VL-1) {
205+
let j = (i * 2) + 1;
206+
if (vm == 0b1) | (v0[i] == 0b1) then
207+
set_velem(vd, EEW=SEW, i, get_velem(vs2, SEW, j));
208+
// inactive element handling follows VMA
209+
}
210+
// tail element handling follows VTA
211+
RETIRE_SUCCESS
212+
}
213+
--
214+
215+
<<<
216+
217+
[[insns-vpaire, Vector Pair Even]]
218+
=== Vector Pair Even Instruction
219+
220+
Synopsis::
221+
222+
Interleave the even-indexed elements of the source vector register groups into
223+
the destination vector register group.
224+
225+
Mnemonic::
226+
227+
vpaire.vv vd, vs2, vs1, vm
228+
229+
230+
Encoding::
231+
232+
[wavedrom, , svg]
233+
....
234+
{reg:[
235+
{bits: 7, name: 'OP-V'},
236+
{bits: 5, name: 'vd'},
237+
{bits: 3, name: 'OPIVV'},
238+
{bits: 5, name: 'vs1'},
239+
{bits: 5, name: 'vs2'},
240+
{bits: 1, name: 'vm'},
241+
{bits: 6, name: '001111'},
242+
], config:{lanes: 1, hspace:1024}}
243+
....
244+
245+
Description::
246+
247+
The vector pair-even instruction (VPAIRE) interleaves the even-indexed
248+
elements of the source vector register groups into the destination vector
249+
register group. +
250+
+
251+
For destination element index `i`, if `i` is even then `vd[i] = vs2[i]`, and if
252+
`i` is odd then `vd[i] = vs1[i - 1]`. +
253+
+
254+
Equivalently, the result order is:
255+
`vd = [vs2[0], vs1[0], vs2[2], vs1[2], ... ]` +
256+
+
257+
The destination vector register group cannot overlap the source vector register
258+
groups and, if masked, cannot overlap the mask register. If the overlap
259+
violates these constraints, the instruction encoding is reserved. +
260+
+
261+
Prestart, inactive, and tail element handling follow the standard vector rules.
262+
263+
Operation::
264+
265+
[source,sail]
266+
--
267+
function clause execute (VPAIRE(vs2, vs1, vd, vm)) = {
268+
foreach (i from vstart to VL-1) {
269+
let j = if (i % 2) == 0 then i else (i - 1);
270+
let res = if (i % 2) == 0
271+
then get_velem(vs2, SEW, j)
272+
else get_velem(vs1, SEW, j);
273+
if (vm == 0b1) | (v0[i] == 0b1) then
274+
set_velem(vd, EEW=SEW, i, res);
275+
// inactive element handling follows VMA
276+
}
277+
// tail element handling follows VTA
278+
RETIRE_SUCCESS
279+
}
280+
--
281+
282+
<<<
283+
284+
[[insns-vpairo, Vector Pair Odd]]
285+
=== Vector Pair Odd Instruction
286+
287+
Synopsis::
288+
289+
Interleave the odd-indexed elements of the source vector register groups into
290+
the destination vector register group.
291+
292+
Mnemonic::
293+
294+
vpairo.vv vd, vs2, vs1, vm
295+
296+
297+
Encoding::
298+
299+
[wavedrom, , svg]
300+
....
301+
{reg:[
302+
{bits: 7, name: 'OP-V'},
303+
{bits: 5, name: 'vd'},
304+
{bits: 3, name: 'OPMVV'},
305+
{bits: 5, name: 'vs1'},
306+
{bits: 5, name: 'vs2'},
307+
{bits: 1, name: 'vm'},
308+
{bits: 6, name: '001111'},
309+
], config:{lanes: 1, hspace:1024}}
310+
....
311+
312+
Description::
313+
314+
The vector pair-odd instruction (VPAIRO) interleaves the odd-indexed
315+
elements of the source vector register groups into the destination vector
316+
register group. +
317+
+
318+
For destination element index `i`, if `i` is even then `vd[i] = vs2[i + 1]`, and
319+
if `i` is odd then `vd[i] = vs1[i]`. +
320+
+
321+
Equivalently, the result order is:
322+
`vd = [vs2[1], vs1[1], vs2[3], vs1[3], ... ]` +
323+
+
324+
The destination vector register group cannot overlap the source vector register
325+
groups and, if masked, cannot overlap the mask register. If the overlap
326+
violates these constraints, the instruction encoding is reserved. +
327+
+
328+
Prestart, inactive, and tail element handling follow the standard vector rules.
329+
+
330+
VPAIRO may read one element past `VL` in `vs2` when `VL` is odd. If an element
331+
index is greater than or equal to VLMAX in the source vector register group,
332+
the value 0 is returned for that element.
333+
334+
Operation::
335+
336+
[source,sail]
337+
--
338+
function clause execute (VPAIRO(vs2, vs1, vd, vm)) = {
339+
foreach (i from vstart to VL-1) {
340+
let j = if (i % 2) == 0 then (i + 1) else i;
341+
let res =
342+
if (j >= vlmax) then zeros()
343+
else if (i % 2) == 0 then get_velem(vs2, SEW, j)
344+
else get_velem(vs1, SEW, j);
345+
if (vm == 0b1) | (v0[i] == 0b1) then
346+
set_velem(vd, EEW=SEW, i, res);
347+
// inactive element handling follows VMA
348+
}
349+
// tail element handling follows VTA
350+
RETIRE_SUCCESS
351+
}
352+
--
353+
354+
<<<
355+
356+
[NOTE]
357+
====
358+
359+
The following example illustrates use of the vector pair-even and pair-odd to
360+
transpose VL/4 independent 4x4 matrices packed across vector registers.
361+
362+
The first stage operates on 32-bit elements. The second stage packs adjacent
363+
pairs into 64-bit elements to complete the transpose.
364+
365+
----
366+
vsetivli t0, zero, e32, m1, ta, ma
367+
vpaire.vv v5, v1, v2 #|a|b|c|d|A|B|C|D|.. |a|e|c|g|A|E|C|G|..
368+
vpairo.vv v6, v1, v2 #|e|f|g|h|E|F|G|H|.. -> |b|f|d|h|B|F|D|H|..
369+
vpaire.vv v7, v3, v4 #|i|j|k|l|I|J|K|L|.. |i|m|k|o|I|M|K|O|..
370+
vpairo.vv v8, v3, v4 #|m|n|o|p|M|N|O|P|.. |j|n|l|p|J|N|L|P|..
371+
372+
vsetivli t0, zero, e64, m1, ta, ma
373+
vpaire.vv v1, v5, v7 #|a e|c g|A E|C G|.. |a e|i m|A E|I M|..
374+
vpaire.vv v2, v6, v8 #|b f|d h|B F|D H|.. -> |b f|j n|B F|J N|..
375+
vpairo.vv v3, v5, v7 #|i m|k o|I M|K O|.. |c g|k o|C G|K O|..
376+
vpairo.vv v4, v6, v8 #|j n|l p|J N|L P|.. |d h|l p|D H|L P|..
377+
----
378+
379+
====

0 commit comments

Comments
 (0)