Skip to content

Commit 43c05dc

Browse files
committed
Add Zvzip extension for reordering structured data
1 parent b9f1438 commit 43c05dc

File tree

3 files changed

+383
-0
lines changed

3 files changed

+383
-0
lines changed

src/colophon.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ h|Extension h|Version h|Status
6262
|*Zclsd* |*1.0* |*Ratified*
6363
|*B* |*1.0* |*Ratified*
6464
|*V* |*1.0* |*Ratified*
65+
|*Zvzip* |*0.1* |_Draft_
6566
|*Zbkb* |*1.0* |*Ratified*
6667
|*Zbkc* |*1.0* |*Ratified*
6768
|*Zbkx* |*1.0* |*Ratified*

src/riscv-unprivileged.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,7 @@ include::c-st-ext.adoc[]
183183
include::zc.adoc[]
184184
include::b-st-ext.adoc[]
185185
include::v-st-ext.adoc[]
186+
include::zvzip.adoc[]
186187
include::scalar-crypto.adoc[]
187188
include::vector-crypto.adoc[]
188189
include::unpriv-cfi.adoc[]

src/zvzip.adoc

Lines changed: 381 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,381 @@
1+
== "Zvzip" Extension for Reordering Structured Data, Version 0.1
2+
3+
This chapter describes the Zvzip standard extension for reordering structured
4+
data in vector registers. These instructions address usages such as packing and
5+
unpacking data structures such as color components of a pixel, real and
6+
imaginary components of complex numbers, transposing small matrices, among
7+
others.
8+
9+
[%autowidth]
10+
[%header,cols="2,4"]
11+
|===
12+
|Mnemonic |Instruction
13+
| vzip.vv | <<insns-vzip>>
14+
| vunzipe.v | <<insns-vunzipe>>
15+
| vunzipo.v | <<insns-vunzipo>>
16+
| vpaire.vv | <<insns-vpaire>>
17+
| vpairo.vv | <<insns-vpairo>>
18+
|===
19+
20+
The Zvzip Extension depends on the Zve32x Extension.
21+
22+
<<<
23+
24+
[[insns-vzip, Vector Zip]]
25+
=== Vector Zip Instruction
26+
27+
Synopsis::
28+
29+
Interleave elements from source vector register groups into destination vector
30+
register groups.
31+
32+
Mnemonic::
33+
34+
vzip.vv vd, vs2, vs1, vm
35+
36+
Encoding::
37+
38+
[wavedrom, , svg]
39+
....
40+
{reg:[
41+
{bits: 7, name: 'OP-V'},
42+
{bits: 5, name: 'vd'},
43+
{bits: 3, name: 'OPMVV'},
44+
{bits: 5, name: 'vs1'},
45+
{bits: 5, name: 'vs2'},
46+
{bits: 1, name: 'vm'},
47+
{bits: 6, name: '111110'},
48+
], config:{lanes: 1, hspace:1024}}
49+
....
50+
51+
Description::
52+
53+
Vector Zip (VZIP) instruction interleaves elements from two source vector
54+
register groups (`vs2` and `vs1`) into one destination vector register group
55+
(`vd`) by alternating elements from the two sources. +
56+
+
57+
For destination element index `i`, if `i` is even then `vd[i] = vs2[i/2]`, and
58+
if `i` is odd then `vd[i] = vs1[i/2]`. +
59+
+
60+
Equivalently, the result order is:
61+
`vd = [vs2[0], vs1[0], vs2[1], vs1[1], ... ]` +
62+
+
63+
This instruction operates with an effective vector length (EVL) of 2*VL. The
64+
destination EMUL is 2xLMUL. The instruction is reserved when LMUL is 8.
65+
Prestart, inactive, and tail element handling follows the standard vector
66+
rules, applied over the EVL. +
67+
+
68+
The destination vector register group may overlap the source vector register
69+
group if the overlap is in the highest-numbered part of the destination
70+
register group and the source EMUL is at least 1. If the overlap violates these
71+
constraints, the instruction encoding is reserved.
72+
73+
Operation::
74+
75+
[source,sail]
76+
--
77+
function clause execute (VZIP(vs2, vs1, vd, vm)) = {
78+
EVL = 2 * VL;
79+
foreach (i from vstart to EVL-1) {
80+
let j = i / 2;
81+
let op1 = get_velem(vs1, SEW, j);
82+
let op2 = get_velem(vs2, SEW, j);
83+
let res = if (i % 2 == 0) then op2 else op1;
84+
if (vm == 0b1) | (v0[i] == 0b1) then
85+
set_velem(vd, EEW=SEW, i, res);
86+
// inactive element handling follows VMA
87+
}
88+
// tail element handling follows VTA
89+
RETIRE_SUCCESS
90+
}
91+
--
92+
93+
<<<
94+
95+
[[insns-vunzipe, Vector Unzip Even]]
96+
=== Vector Unzip Even Instruction
97+
98+
Synopsis::
99+
100+
Extract even-indexed elements from source vector register group into the
101+
destination vector register group.
102+
103+
Mnemonic::
104+
105+
vunzipe.v vd, vs2, vm
106+
107+
Encoding::
108+
109+
[wavedrom, , svg]
110+
....
111+
{reg:[
112+
{bits: 7, name: 'OP-V'},
113+
{bits: 5, name: 'vd'},
114+
{bits: 3, name: 'OPMVV'},
115+
{bits: 5, name: '01011'},
116+
{bits: 5, name: 'vs2'},
117+
{bits: 1, name: 'vm'},
118+
{bits: 6, name: '010010'},
119+
], config:{lanes: 1, hspace:1024}}
120+
....
121+
122+
Description::
123+
124+
The vector unzip-even instruction (VUNZIPE) extracts VL even-indexed elements
125+
from the source vector register group into the destination vector register
126+
group. +
127+
+
128+
This instruction accesses 2*VL elements in the source vector register group and
129+
the source EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. +
130+
+
131+
Prestart, inactive, and tail element handling follow the standard vector
132+
rules and are defined over the destination element indices (`0` to `VL-1`). +
133+
+
134+
The destination vector register group may overlap the source vector register
135+
group only if the overlap is in the lowest-numbered part of the source register
136+
group. If the overlap violates these constraints, the instruction encoding is
137+
reserved.
138+
139+
Operation::
140+
141+
[source,sail]
142+
--
143+
function clause execute (VUNZIPE(vs2, vd, vm)) = {
144+
foreach (i from vstart to VL-1) {
145+
let j = i * 2;
146+
if (vm == 0b1) | (v0[i] == 0b1) then
147+
set_velem(vd, EEW=SEW, i, get_velem(vs2, SEW, j));
148+
// inactive element handling follows VMA
149+
}
150+
// tail element handling follows VTA
151+
RETIRE_SUCCESS
152+
}
153+
--
154+
155+
<<<
156+
157+
[[insns-vunzipo, Vector Unzip Odd]]
158+
=== Vector Unzip Odd Instruction
159+
160+
Synopsis::
161+
162+
Extract odd-indexed elements from source vector register group into the
163+
destination vector register group.
164+
165+
Mnemonic::
166+
167+
vunzipo.v vd, vs2, vm
168+
169+
Encoding::
170+
171+
[wavedrom, , svg]
172+
....
173+
{reg:[
174+
{bits: 7, name: 'OP-V'},
175+
{bits: 5, name: 'vd'},
176+
{bits: 3, name: 'OPMVV'},
177+
{bits: 5, name: '01111'},
178+
{bits: 5, name: 'vs2'},
179+
{bits: 1, name: 'vm'},
180+
{bits: 6, name: '010010'},
181+
], config:{lanes: 1, hspace:1024}}
182+
....
183+
184+
Description::
185+
186+
The vector unzip-odd instruction (VUNZIPO) extracts VL odd-indexed elements
187+
from the source vector register group into the destination vector register
188+
group. +
189+
+
190+
This instruction accesses 2*VL elements in the source vector register group and
191+
the source EMUL is 2xLMUL. The instruction is reserved when LMUL is 8. +
192+
+
193+
Prestart, inactive, and tail element handling follow the standard vector
194+
rules and are defined over the destination element indices (`0` to `VL-1`). +
195+
+
196+
The destination vector register group may overlap the source vector register
197+
group only if the overlap is in the lowest-numbered part of the source register
198+
group. If the overlap violates these constraints, the instruction encoding is
199+
reserved.
200+
201+
Operation::
202+
203+
[source,sail]
204+
--
205+
function clause execute (VUNZIPO(vs2, vd, vm)) = {
206+
foreach (i from vstart to VL-1) {
207+
let j = (i * 2) + 1;
208+
if (vm == 0b1) | (v0[i] == 0b1) then
209+
set_velem(vd, EEW=SEW, i, get_velem(vs2, SEW, j));
210+
// inactive element handling follows VMA
211+
}
212+
// tail element handling follows VTA
213+
RETIRE_SUCCESS
214+
}
215+
--
216+
217+
<<<
218+
219+
[[insns-vpaire, Vector Pair Even]]
220+
=== Vector Pair Even Instruction
221+
222+
Synopsis::
223+
224+
Interleave the even-indexed elements of the source vector register groups into
225+
the destination vector register group.
226+
227+
Mnemonic::
228+
229+
vpaire.vv vd, vs2, vs1, vm
230+
231+
232+
Encoding::
233+
234+
[wavedrom, , svg]
235+
....
236+
{reg:[
237+
{bits: 7, name: 'OP-V'},
238+
{bits: 5, name: 'vd'},
239+
{bits: 3, name: 'OPIVV'},
240+
{bits: 5, name: 'vs1'},
241+
{bits: 5, name: 'vs2'},
242+
{bits: 1, name: 'vm'},
243+
{bits: 6, name: '001111'},
244+
], config:{lanes: 1, hspace:1024}}
245+
....
246+
247+
Description::
248+
249+
The vector pair-even instruction (VPAIRE) interleaves the even-indexed
250+
elements of the source vector register groups into the destination vector
251+
register group. +
252+
+
253+
For destination element index `i`, if `i` is even then `vd[i] = vs2[i]`, and if
254+
`i` is odd then `vd[i] = vs1[i - 1]`. +
255+
+
256+
Equivalently, the result order is:
257+
`vd = [vs2[0], vs1[0], vs2[2], vs1[2], ... ]` +
258+
+
259+
The destination vector register group cannot overlap the source vector register
260+
groups and, if masked, cannot overlap the mask register. If the overlap
261+
violates these constraints, the instruction encoding is reserved. +
262+
+
263+
Prestart, inactive, and tail element handling follow the standard vector rules.
264+
265+
Operation::
266+
267+
[source,sail]
268+
--
269+
function clause execute (VPAIRE(vs2, vs1, vd, vm)) = {
270+
foreach (i from vstart to VL-1) {
271+
let j = if (i % 2) == 0 then i else (i - 1);
272+
let res = if (i % 2) == 0
273+
then get_velem(vs2, SEW, j)
274+
else get_velem(vs1, SEW, j);
275+
if (vm == 0b1) | (v0[i] == 0b1) then
276+
set_velem(vd, EEW=SEW, i, res);
277+
// inactive element handling follows VMA
278+
}
279+
// tail element handling follows VTA
280+
RETIRE_SUCCESS
281+
}
282+
--
283+
284+
<<<
285+
286+
[[insns-vpairo, Vector Pair Odd]]
287+
=== Vector Pair Odd Instruction
288+
289+
Synopsis::
290+
291+
Interleave the odd-indexed elements of the source vector register groups into
292+
the destination vector register group.
293+
294+
Mnemonic::
295+
296+
vpairo.vv vd, vs2, vs1, vm
297+
298+
299+
Encoding::
300+
301+
[wavedrom, , svg]
302+
....
303+
{reg:[
304+
{bits: 7, name: 'OP-V'},
305+
{bits: 5, name: 'vd'},
306+
{bits: 3, name: 'OPMVV'},
307+
{bits: 5, name: 'vs1'},
308+
{bits: 5, name: 'vs2'},
309+
{bits: 1, name: 'vm'},
310+
{bits: 6, name: '001111'},
311+
], config:{lanes: 1, hspace:1024}}
312+
....
313+
314+
Description::
315+
316+
The vector pair-odd instruction (VPAIRO) interleaves the odd-indexed
317+
elements of the source vector register groups into the destination vector
318+
register group. +
319+
+
320+
For destination element index `i`, if `i` is even then `vd[i] = vs2[i + 1]`, and
321+
if `i` is odd then `vd[i] = vs1[i]`. +
322+
+
323+
Equivalently, the result order is:
324+
`vd = [vs2[1], vs1[1], vs2[3], vs1[3], ... ]` +
325+
+
326+
The destination vector register group cannot overlap the source vector register
327+
groups and, if masked, cannot overlap the mask register. If the overlap
328+
violates these constraints, the instruction encoding is reserved. +
329+
+
330+
Prestart, inactive, and tail element handling follow the standard vector rules.
331+
+
332+
VPAIRO may read one element past `VL` in `vs2` when `VL` is odd. If the element
333+
index is greater than or equal to VLMAX in the source vector register group,
334+
the value 0 is returned for that element.
335+
336+
Operation::
337+
338+
[source,sail]
339+
--
340+
function clause execute (VPAIRO(vs2, vs1, vd, vm)) = {
341+
foreach (i from vstart to VL-1) {
342+
let j = if (i % 2) == 0 then (i + 1) else i;
343+
let res =
344+
if (j >= vlmax) then zeros()
345+
else if (i % 2) == 0 then get_velem(vs2, SEW, j)
346+
else get_velem(vs1, SEW, j);
347+
if (vm == 0b1) | (v0[i] == 0b1) then
348+
set_velem(vd, EEW=SEW, i, res);
349+
// inactive element handling follows VMA
350+
}
351+
// tail element handling follows VTA
352+
RETIRE_SUCCESS
353+
}
354+
--
355+
356+
<<<
357+
358+
[NOTE]
359+
====
360+
361+
The following example illustrates use of the vector pair-even and pair-odd to
362+
transpose VL/4 independent 4x4 matrices packed across vector registers.
363+
364+
The first stage operates on 32-bit elements. The second stage packs adjacent
365+
pairs into 64-bit elements to complete the transpose.
366+
367+
----
368+
vsetivli t0, zero, e32, m1, ta, ma
369+
vpaire.vv v5, v1, v2 #|a|b|c|d|A|B|C|D|.. |a|e|c|g|A|E|C|G|..
370+
vpairo.vv v6, v1, v2 #|e|f|g|h|E|F|G|H|.. -> |b|f|d|h|B|F|D|H|..
371+
vpaire.vv v7, v3, v4 #|i|j|k|l|I|J|K|L|.. |i|m|k|o|I|M|K|O|..
372+
vpairo.vv v8, v3, v4 #|m|n|o|p|M|N|O|P|.. |j|n|l|p|J|N|L|P|..
373+
374+
vsetivli t0, zero, e64, m1, ta, ma
375+
vpaire.vv v1, v5, v7 #|a e|c g|A E|C G|.. |a e|i m|A E|I M|..
376+
vpaire.vv v2, v6, v8 #|b f|d h|B F|D H|.. -> |b f|j n|B F|J N|..
377+
vpairo.vv v3, v5, v7 #|i m|k o|I M|K O|.. |c g|k o|C G|K O|..
378+
vpairo.vv v4, v6, v8 #|j n|l p|J N|L P|.. |d h|l p|D H|L P|..
379+
----
380+
381+
====

0 commit comments

Comments
 (0)