Using vectors to deal with multi-coded questions in SPSS

There are two common ways that multi-coded questions are recorded in SPSS .sav files. Either there is one binary variable for each code with a ‘1’ denoting whether the code was chosen and a ‘0’ if it wasn’t; or there is a small number of variables that contain each of the codes that are chosen for each respondent (in this case the number of such variables is usually equal to the maximum number of responses chosen by a single respondent).

E.g., if there were 10 possible multi-codes, the file could look like either:

or like this:

There are pros and cons of each format, e.g., the binary export may be more understandable but can lead to excessive file sizes. However, it is sometimes useful to be able to move from one format to the other (indeed some packages may only export into one or the other). We now show how to use the SPSS vector command to move from one format to another.

In each case, we are operating on the above two datasets, so we know how many codes there are, i.e., 10, and the maximum number of responses over all respondents, i.e., 4. The former will always be known, we will show a simple way of obtaining the latter.

Note how we use scratch variables (e.g., #vec) so that SPSS does not create them.

From binary to multi-code set

vector rQ_(4).

vector #vec=Q_1 to Q_10.

compute #j=1.

loop #i=1 to 10.

do if #vec(#i)=1.

compute rQ_(#j)=#i.

compute #j=#j+1.

end if.

end loop.


The code defines 4 new variables rQ_1 to rQ_4 using the vector command. It then groups all the original 10 variables Q_1 to Q_10 into the vector #vec and loops over them. If the original vector was a ‘1’ then it adds this code to the ‘next’ empty variable in the rQ_1 to rQ_4. The #j variable controls where the position of the next ‘empty’ variable in the set.

There is no straightforward way to hold the maximum number of responses over all respondents for this routine. You could use the aggregate command, or some Python based macro. However, it is just as easy to use the following code to acquire this value

count maxcount=Q_1 to Q_10(1).

desc maxcount



From multi-code set to binary

vector Q_(10).

recode Q_1 to Q_10 (sysmis=0).

vector #vec=RQ_1 to RQ_4.

loop #i=1 to 4.

if not sysmis(#vec(#i)) Q_(#vec(#i))=1.

end loop.


The code defines 10 new variables Q_1 to Q_10 using the vector command, then recodes them to be all zero. It then groups all the original 4 variables RQ_1 to RQ_4 into the vector #vec and loops over them. If there is value in one of these variables, it must correspond to a code. So, it assigns the new variable corresponding to this code to be a ‘1’.

Leave a Reply

Your email address will not be published. Required fields are marked *