Data with a limited number of distinct values or categories (for example, gender or marital status). Also referred to as qualitative data. Categorical variables can be string (alphanumeric) data or numeric variables that use numeric codes to represent categories (for example, 0 = Unmarried and 1 = Married). There are two basic types of categorical data:
- Nominal Data. Categorical data where there is no inherent order to the categories. For example, a job category of sales is not higher or lower than a job category of marketing or research.
- Ordinal Data. Categorical data where there is a meaningful order of categories, but there is not a measurable distance between categories. For example, there is an order to the values high, medium, and low, but the "distance" between the values cannot be calculated.
Data measured on an interval or ratio scale, where the data values indicate both the order of values and the distance between values. For example, a salary of $72,195 is higher than a salary of $52,398, and the distance between the two values is $19,797. Also referred to as quantitative data or continuous data.
Measures for Categorical Data
For categorical data, the most typical summary measure is the number or percentage of cases in each category. The mode is the category with the greatest number of cases. For ordinal data, the median (the value at which half of the cases fall above and below) may also be a useful summary measure if there is a large number of categories.
In SPSS, the Frequencies procedure produces frequency tables that display both the number and percentage of cases for each observed value of a variable.
Measures for Scale Variables
There are many summary measures available for scale variables, including:
• Measures of central tendency. The most common measures of central tendency are the mean (arithmetic average) and median (value at which half the cases fall above and below).
• Measures of dispersion. Statistics that measure the amount of variation or spread in the data include the standard deviation, minimum, and maximum.
Mean vs. Median
The mean is the numerical average, the median is the central point of the value at which hald the cases fall above and below. The value difference between mean and median indicates the shape of data distribution. For example, in the sample of personal income data, there is usually a large difference between the mean and the median. The mean may be about 25,000 greater than the median, indicating that the values are not normally distributed. In SPSS, this can be visually checked in the distribution with a histogram. In the histogram, the data may show that the majority of cases are clustered at the lower end of the scale, with most falling below 100,000. There are, however, a few cases in the 500,000 range and beyond (too few to even be visible without modifying the histogram). These high values for only a few cases have a significant effect on the mean but little or no effect on the median, making the median a better indicator of central tendency in this example.