SPSS basics 1
1 of 13
SPSS basics 1
Featured
I I attributes Accuracy Consistency ReliabilityQuantifying I I and its attributes
On The Anonymity And Traceability Of Peer-to-peer Voip Calls
Surrogate Buyers JCM 1998
GOOD REASONS FOR MERGERS and ACQUISITIONS
Network Planning
Contourmotion
narrative
BUILDING BRAND LOYALTY THROUGH ADVERTISING
Historical Detectives
Bab 9
A Visual Analytics Approach to Model Learning
heat
Attitudes and Job Satisfaction at Workplace
Congruent Polygons
Data Communication Fundamentals - Multiplexing of Signal
PlaceValue
Volvis A Diversified Volume Visualization System
fraction
Roll Of Thunder - Hear My Cry By Mildred D Taylor
ImperialistJapan
SPSS basics 1 - Transcript
Communication and Information Technology
Section 18 SPSS basics 1
Recommended reading
Paul R Kinnear Colin D Gray SPSS for Windows made simple release 10 Psychology Press Taylor Francis Group 2000 Chapters 2 thru 5 bits of these chapters only
Something to be aware of when reading this document
In the computer lab workshops you will be making use of files on your H drive The screenshots for this document were created on my office computer where the drive letters assigned to various drives are different to the computer lab computers Bear this in mind when examining the precise syntax used
What is SPSS
SPSS is the acronym for the Statistical Package for the Social Sciences The further you progress in the social sciences particularly in the likes of economics geography and psychology the more that statistics will become an important part of your studies work not just actual numbers statistics themselves but also statistical methods for analysing data SPSS is a very good statistical program and although there are many many other statistical programs SPSS is probably the best to start off with
Excel is perfectly capable of being used for basic statistical purposes indeed we have been doing that with this CITASS module Thus for example we have used Excel to calculate means medians standard deviations variances correlation coefficients and frequency distributions not only for the whole sample of data but also for sub samples of it as well
However programs like SPSS have a much greater scope for what they can do and many basic tasks that can be done in Excel can be done much more easily in SPSS For this reason few academics actually use Excel for their research
The version of SPSS
SPSS is available for use by staff and students on all computer lab PCs SPSS is updated on a regular basis by the firm that develops it SPSS Inc and the University of Dundee upgrades its systems shortly after a new version becomes available Given that it is very unlikely that any student will have access to SPSS outwith the computing labs unlike the situation with Excel it is likely that all students will have to perform their homework in the computer labs Student licences for SPSS can be bought but they are very expensive
The three SPSS windows
SPSS operates through three main windows a data editor window that looks a lot like an Excel spreadsheet a syntax editor window and a viewer window the latter can be either a draft or a non draft viewer window the difference being the way in which its information is presented For the most part these three windows will be referred to as the data syntax and output windows respectively
The data window is where you can directly enter data into SPSS If you run analyses which affect your data which SPSS calls transformations then the results will be seen in this window The window s appearance is shown in the screenshot below
As with most Windows programs at the top there is a title bar below that a menu bar below that a toolbar below that is a pane showing the contents of the highlighted cell and below that a spreadsheet like grid showing the data being worked on in the program
One of the advantages of SPSS over Excel is the ability to type in commands that will affect all or only a few observations at the same time in Excel we always viewed a row as an observation and in SPSS this is no different These SPSS commands also known as SPSS syntax are typed into the syntax window and then executed i e run In Excel there was a need to enter a formula for one observation row and then copy it to all the other observations rows for which that command was required
The appearance of the syntax window is shown by the screenshot below Note that at the bottom of this window the phrase SPSS Processor is ready This means that SPSS is not executing any commands and hence is ready to do whatever you ask of it next
In Excel the results of any analysis were always contained within one of the cells in the spreadsheet typically you would type an equation into a cell and although Excel would remember the equation it would actually display in the cell the result of that equation In SPSS there are two ways in which the results are represented Where the analysis affects the actual data then the data displayed in the data window is updated However SPSS can generate statistical answers e g the mean or the median of a variable and these answers are displayed in the output window
We are now going to look more closely at each of these three windows
The data window introduction
In Excel you had rows and columns and at each point of intersection between the two you had a cell In the Excel files I have provided to students the first row of the spreadsheet always contained a variable name e g MALE STATUS etc one variable per column and the data for each observation occupied one row beneath these variable names thus if there are 5 000 observations row 1 would be the variable names and rows 2 to 5 001 would be the data for the 5 000 observations It is not compulsory to view all data columns in an Excel spreadsheet as a variable but in the analyses we have undertaken that s the way it always was
In the SPSS data window however that is how it must be thought of Every column is one variable and every row is one observation Always
In Excel all the cells were numbered according to the CR convention column letter s followed by row number Thus cell C4 is the intersection of column C i e the third column and row 4
However in SPSS all variables have a name and that is how we refer to columns These variable names can be no longer than eight characters long can contain no spaces and most but not all non alphabet characters are prohibited
The following are examples of acceptable variable names
STATUS ETHNIC TENURE MALE
MARRIED QUAL EXP UN WAGERATE
lower UPPER
The following are examples of unacceptable variable names
LONGNAMES WEIRD WITH SPACES
Note that variable names are not case sensitive
When we first started looking at Excel we looked at data on housing e g the value of private property the number of rooms it contained etc When loaded into SPSS the data is represented in the data editor window as shown in the screenshot below
The next thing to note is that there are actually two tabs in the data editor window at the bottom of the screen we can see we are looking at the data view tab whilst there is another tab the variable view tab that we can also look at Clicking on that variable view tab we see the following
What we now see is not the actual data but information about each of the variables in the dataset the name the type e g numeric string etc the width decimal places if numeric and so on In the variable view you can change various characteristics of your variables e g how many decimal places for your numeric variables
Click on the data view tab again and we can see our data again
The data editor window opening and saving files
icon or choose File Open and it s a fairly standard file choice then the native data format for SPSS is its own sav file format
The data editor window simple analyses
Simple analyses can be done through the menu system For example choosing Analyze Descriptive Statistics Frequencies you can get a simple frequency table for one or more of your variables
Choosing frequencies brings up the following screen On the left are all of the variables in the data editor window On the right are all of the variables on which a frequencies is to be performed At present there are none To select one or more highlight and then press the right arrow button between the two panes To deselect a variable already chosen highlight it and click on that same button which will now have changed to a left arrow Once you have the choice of variables you want click on the OK button The results will appear in an output window
The output window
If you already have an output window opened then the results will appear there If you do not have an output window open one will be opened automatically to receive the results Suppose that you had chosen to calculate the frequency distribution of the ROOMS variable The result would look something like the following
There was so much output that the output at the top has simply scrolled off the screen However the scroll bar at right permits you to look at what is no longer visible on screen
Scrolling most of the way up we get to where the columns are explained
Each row in the table represents a value occurring at least once in that variable The number under Frequency is the number of observations that have that value for that variable Thus there are 92 cases that have a value of 1 for the variable ROOMS and 492 cases that have a value of 2 and so on The percent column expressed the frequency value as a percentage of all the cases Thus 92 cases represents 1 6 of the whole sample 492 cases represents 8 5 and so on Ignore valid percent it will not be used in the CITASS module Cumulative percent simply adds up the percentages as we progress down the table Thus if 1 6 have 1 room and 8 5 have 2 rooms then approx 10 0 have 1 or 2 rooms
The basics of SPSS syntax
One of the advantages of SPSS over something like Excel is the fact that you can create a file containing sophisticated commands and save that file for subsequent use at a later date These commands are called syntax and learning to use this syntax will be one of the main tasks in our work with SPSS
icon In SPSS syntax you would type in the following command syntax into the syntax window and then execute the command assuming the data file was d coursework citass materials data household0 sav
get file d coursework citass materials data household0 sav
Note the final period which indicates the end of every command in SPSS and also the apostrophes around either end of the filename including full path When you have entered the syntax into the syntax window it should look like the following
icon you can highlight and therefore execute more than one command at a time if you want Whilst it is executing the command s the phrase SPSS Processor is ready is replaced by the command it is currently executing but once the commands are completed it returns to SPSS Processor is ready
If the command changes your data then the data window will reflect these changes The syntax window should not change Depending on how you have modified the setup of your output window this process is covered later output or error messages will appear in your output window
SPSS syntax subcommands
In addition to commands e g get file you can apply certain subcommands to your main commands Thus whilst the following command would get the whole dataset
get file d coursework citass materials data household0 sav
the following command has a subcommand that gets only three of the variables in the dataset HHID REGION ROOMS
get file d coursework citass materials data household0 sav
keep HHID REGION ROOMS
Note that the period comes at the end of the complete command and that the subcommand is indented i e there is a space at the start of the line and it begins with a
Transformations pending
Students may well find that when they issue syntax commands that will alter the data loaded into the computer s memory the syntax commands are not immediately carried out At the bottom of the SPSS windows they will see the term Transformations pending SPSS knows there is a data transformation to be undertaken but is effectively asking you to confirm that you want those transformations undertaken You do this by choosing Transform Run Pending Transformations
Comment
SPSS syntax files can grow to be very large and the commands can become very complex and following what is going on at every stage can be problematic not only for others but also for yourself if you leave it and come back to it later SPSS enables users to explain their syntax files with the use of the Comment command
Basically everything in a Comment command is ignored by SPSS when it processes a syntax file so you can write in a Comment command whatever needs to be written to explain what is going on For example
comment The following creates a variable FEET from the variable INCHES
compute FEET INCHES 12
Note that Comment commands can extend over more than one line but the second third etc lines must not start in the first column i e they start with a space
comment The following creates a variable FEET from the variable INCHES
which is all very obvious if you think about it
compute FEET INCHES 12
The fundamental SPSS file management commands
There are a number of SPSS commands relating to file operations The most important are
get file path filename
As previously indicated this will get a file already stored on a disk
add files file file path filename
Adds additional observations to the dataset already stored in memory file means the file in memory while the file path filename indicates the file containing the additional observations to be added
match files file file path filename by ID1 ID2
Adds additional variables rather than additional observations the additional variables are in the second file the by ID1 ID2 indicates how the variables in the additional file are to be matched with the data already in memory ID1 and ID2 are variables and you need sufficient identifying variables to permit exact and unique matches furthermore the datasets need to be sorted in the same order
save outfile path filename
Save the data in memory to a file on a disk
For each of the above four commands you can use the keep command which indicates which variables are to be kept or alternatively the drop command which indicates which variables are to be dropped
These may seem complicated at the moment but they will make more sense when you actually use them in a computer lab situation
The fundamental SPSS statistics commands
In what follows variable list is the list of variables you want the analysis performed on You can list the individual variables if you wish but if you want it performed on all variables then simply use all
The two most commonly used statistical commands are
freq vars variable list
Thus freq vars all would bring up the frequencies of all the variables in the dataset This is a useful but dangerous command you may have dozens or even hundreds of variables and some of them may have hundreds of different values for which a frequency needs to be worked out consequently the output file may become huge and take a long time to be generated
desc vars variable list
This would show for each variable in the list the minimum maximum mean and standard deviation
The COMPUTE command
The COMPUTE command is the most direct way to create a new variable in SPSS or modify one that already exists The syntax is fairly straightforward for example
compute NEWVAR 2 OLDVAR
In this case the value of the variable NEWVAR for an observation will simply be twice the value of the variable OLDVAR for that observation
You may want to include brackets to indicate the order in which operations are to be undertaken
compute NEWVAR OLDVAR1 2 OLDVAR2 4
The IF command
Sometimes you want to perform a COMPUTE command conditional on a criteria being met i e perform the COMPUTE command only if a condition is met This is achieved by use of the IF command Thus for example
if OLDVAR1 1 NEWVAR OLDVAR2
The DO IF END IF subroutine command
If you want to do a whole series of commands conditional on a criteria being met then this is best achieved by use of the DO IF END IF subroutine command
do if OLDVAR1 1
compute NEWVAR1 OLDVAR2
compute NEWVAR2 OLDVAR3
compute NEWVAR3 NEWVAR1 NEWVAR2
end if
The SELECT IF command
One of the things that you may want to do in SPSS is keep in your working dataset only certain observations This is achieved with the use of the SELECT IF command Basically only those observations that meet a criteria will be retained For example
select if STATUS 2
Any observations that have a value of 2 for the variable STATUS will be retained whilst all other observations will be dropped
This is a particularly important command when some of the variables you are using have missing values the respondent has not provided the data Where there is missing data there is typically a value of 999 Including these cases in the calculation of means etc will lead to seriously erroneous answers e g people work on average a negative number of hours per week etc Use a command like
select if HOURS1 0
Alternatively
select if STATUS 999
where 999 means not equal to 999
Note however that the cases eliminated from the data are eliminated for the rest of the analysis not just the next command and they are not eliminated from the data stored on disk unless you store the data in memory to the same filename on the hard disk
The SORT CASES command
In Excel we frequently needed to sort data often by the STATUS variable in order to perform analyses on only certain cases In SPSS you often need to sort cases as well This is achieved through the use of the SORT CASES command For example
sort cases by STATUS
This command is usually required to be used on data files that are about to be matched with the match files command since the two files must be sorted in the same order for the matching to be successful
The output window settings
You can modify how things are shown in each of the three windows by choosing Edit Options and then choosing the tabbed screen that contains the options you want to change At this point it would be useful to look at the options for the draft output draft viewer window It is purely a matter of opinion but the following is a useful choice of settings better than the default ones
Some final things to mention
saving your syntax file
When you enter correct syntax into the syntax window make sure you save the work so that its there for you if you come back to it at a later point in time you don t have to reinvent the wheel Including comments will help you remember what you were doing
all syntax in the same syntax file
Where possible include all the related syntax in the same syntax file so that you can run multiple commands one after the other
error messages
SPSS requires you to type in commands that are 100 correct or else it will do either a something that is not quite what you intended or b come up with an error warning message When you do get an error warning message you should look at your syntax to see what is causing the error When it comes to SPSS syntax 99 correct is essentially 100 wrong
the help system
SPSS has an extensive online help system available via Help Syntax Guide Base This includes extensive help on ALL the commands in SPSS Its an Adobe Acrobat file
For example suppose that we had data on students We had loaded one data file into memory and wanted to match that data with data from another file the second data file had data on the same students but contained different additional variables Matching by first name only would produce errors because it is likely there would be several students with the same first name Matching by both first name and second name would probably be fine but it would be best to match via matriculation number since that is guaranteed to be unique
Since you can only have one data file in memory at the one time viewing two matched data files as one data file this means that you need to sort one of them and then save it in that sorted state
Communication and Information Technology in the Arts and Social Sciences
18 SPSS basics 1
Faculty of Arts and Social Sciences University of Dundee
PAGE
PAGE 2












