Do you hate these posts, O readers who subscribed to this blog for the tales of life with small children and not for the tips on deriving z-scores? I find these posts very useful personally and I do hope that unhappy googlers trying to learn R will also appreciate them, but they are probably a snore for the rest of you.
Yesterday I figured out part of something that had perplexed me. I would slap tables of variables together on a regular basis, and sometimes they would have column names and sometimes they would not. Part of the answer is that dataframes have column names; matrices don't. You can't do much (any?) statistical analysis on a matrix. But if you have three vectors of equal length named x, y, and z, you can say this--
--and get yourself a dataframe called "more.data," with columns headed x, y, and z.
Suppose you'd like to compare kids' performance on measures x, y, and z, but their scales are very different. You can derive z-scores as easy as pie:
Now you have a new dataframe called z.data, in which your measures have been converted to z-scores.
Suppose, though, that you don't want everything converted to z-scores -- suppose you have a column for participant ID#, and a column for group membership. Obviously, you don't want those to be z-scores. Participant #0.549879 is less than illuminating if you assigned him ID #62. You could convert the variables individually and then stick them into a new dataframe, but you can also do it more easily.
It took me a long time to figure out something about R that's actually pretty straightforward. In web pages about manipulating data, I kept seeing sets of brackets with mysterious commas in them. For a long time I just followed instructions blindly, but I finally saw what was going on. You can tell R what to do with your data in terms of rows and columns inside square brackets -- rows first, columns second. So this--
--tells R to give you a new array of data containing rows 2-5 and the odd-numbered columns of the old dataset. Or you could do this--
> part.of.my.data<-my.data[-c(16), ]
--which means "give me all the rows except 16 [that's the "-c(16)"], and give me every column [that's what you're saying with the ", ]". which is perhaps a little confusing]."
So if you want z-scores for your experimental measures x, y, and z (in columns 3-5) but not for the first two columns (participant ID and group membership), all you have to do is this:
> z.data<-cbind(more.data[ ,c(1,2)], scale(more.data[ ,c(3:5)]))
If you wanted to add a column with the mean of each participant's three z-scores, you could do it like this:
> z.data.with.means<-cbind(z.data, rowMeans(z.data[ ,c(3:5)]))
Extra points if you get all your brackets and parentheses lined up on the first try. You probably already know that if you say "rowmeans" instead of "rowMeans," R will blink blankly at you, waving its little "Klingon only" placard. You might not have considered that you must restrict the row means to the last three columns or else R will average in the participant ID numbers, leading you to wonder how your last participant had a mean z-score of 73. (That's like getting a 7800 on your SAT Verbal -- yes, the one with a maximum score of 800.)
Or maybe you have considered it and I am just dim.