Objectives:
1) Read external data into SAS with and without column numbers
2) Manipulate existing variables and creating new variables
3) Run means
4) Run correlations
Objective 1: Read external data into SAS with and without
column numbers
1) The first line of almost any SAS programs specifies the dataset that you will be working with.
In this case, you are going to create a new dataset, so the first line will indicate the name of the new dataset. You can create 2 different types of SAS datasets: temporary and permanent.
To create a temporary dataset, write:
DATA NAME; (where NAME indicates the name of the new dataset)
To create a permanent dataset, write:
DATA SASUSER.NAME; (where NAME indicates the name of the new dataset)
2) The second line is called the INFILE statement. This INFILE statement indicates the location of the raw (i.e., external) data. For example, if the raw data file is called jamcbcl.dat and is located in the data directory on the C-drive, the second line would read as follows:
INFILE 'C:\DATA\JAMCBCL.DAT';
Note that the location of the external data set is enclosed with a single
quotation mark on each side.
3) The INPUT statement assigns variable names to the data. The way you write the statement will depend on what the raw data looks like.
a) If the data are spaced (i.e., there is a space between each variable),
all you have to do
is write the variable names in order. For example: INPUT ID SEX AGE
IQ;
b) However, if your data are packed (i.e., no spaces between variables),
you will need to
specify the column number for each variable. The column numbers are
specified
immediately after each variable. For example, if your ID is columns
1-5, and sex
is column 6, then your INPUT statement will read: INPUT ID 1-5 SEX
6;
Objective 2: Manipulate existing variables and creating
new variables
After SAS creates a SAS data set from your raw data, you may want to manipulate some variables and even create some new ones. There are a couple of ways to do this:
1) Using IF...THEN statements:
Let's say that you have a variable AGE and you have a sample of children ages 6-18. You may want to divide your sample into two age groups: children (ages 6-11) and adolescents (ages 12-18). Here's how to do it, using the IF...THEN statement:
IF AGE<12 THEN AGEGROUP=1;
IF AGE>=12 THEN AGEGROUP=2;
2) Using IF...THEN statements together with the ELSE statement:
Alternatively, you may also use the ELSE statement to substitute for an IF...THEN statement, if there is only one condition remaining. For example, the following lines will do the exact same thing as the two IF...THEN statements above.
IF AGE<12 THEN AGEGROUP=1;
ELSE AGEGROUP=2;
In this case, this really doesn't save us any time, but if you have a situation where you are specifying several different conditions (e.g., IF AGE<12 and GENDER=1 and SES>3) using the ELSE statement can come in pretty handy.
Objective 3: Run means
Remember the PROC statement from last week when we used PROC PRINT to
print the data?
Now we will use it to get means and other descriptive statistics. Here's
the SAS code along with the line-by-line explanation. Notice the commenting
techniques (I'll explain in class).
PROC MEANS; /* Tells SAS you want to get means and SDs*/
VAR HEIGHT WEIGHT; /* for variables HEIGHT and WEIGHT*/
RUN; /* Tells SAS to go ahead and do it*/
Objective 4: Run correlations
PROC CORR; /* Tells SAS you want to run a correlation*/
VAR HEIGHT WEIGHT; /* for variables HEIGHT and WEIGHT*/
RUN; /* Tells SAS to go ahead and do it*/
OK, that's it for this week. See you guys after Spring break!