Welcome Guestlogin to KGsePGregister at KGsePG email | FAQs

Displaying Distributions with Graphs

download

    1 of 8

    Displaying Distributions with Graphs



    Displaying Distributions with Graphs - Transcript


    1 1 Displaying Distributions with Graphs
    Graphs for Categorical Variables
    Categorical variable places an individual into group or category
    Pie Chart
    Must include all categories
    Use to emphasize category s relation to whole
    Bar Graphs
    Easier to make and read
    More flexible can compare any set of quantities that are measured in the same units
    Graphs for Quantitative Variables
    Quantitative variable takes numerical values for which arithmetic makes sense
    Stemplots
    Gives picture of shape of distribution
    Stem
    All but the final digit
    May have as many digits as needed
    Leaf
    The final digit
    Contains only one single digit
    Back to back stemplot
    Compare two related distributions
    Common stem with leaves on each side
    DO NOT work well for large data sets
    Split stems
    Trimming numbers by removing last digit s
    Histograms
    Breaks range of values of variable into classes
    Displays only the count or percentage that fall into each class
    Choose classes of equal width
    Area will then be determined by height
    All classes will be fairly represented
    Use your judgment in choosing classes to display shape
    Technology Toolbox pp 59 60
    Examining Distributions
    Overall patterns and deviations
    Shape center and spread
    Center midpoint
    Spread smallest and largest values
    Shape
    Modes one unimodal or several peaks
    Symmetric larger and smaller values are mirror images around the midpoint
    Skewed
    Right skewed the right tail is much longer than the left
    Left skewed the left tail is much longer than the right
    Outliers an individual value that falls outside the overall pattern
    Dealing with Outliers
    A matter of judgment
    Look for points that are clearly apart from the body of data
    In general it is not a good idea to just delete or ignore outliers
    Relative Frequency and Cumulative Frequency
    Relative Frequency
    Divide the count in each class interval by the total count
    Multiply by 100 to get the percent
    Cumulative Frequency
    Add the counts in the frequency column that fall in or below the current interval
    Divide the entries by the total count
    Multiply by 100 to get the percent
    Time Plots
    Plots each observation against the time at which it was measured
    ALWAYS time on the horizontal axis
    Measured variable on vertical axis
    Connect the data points with lines to emphasize change over time

    1 2 Describing Distributions with Numbers
    Measuring Center The Mean
    Mean average value

    Sensitive to the influence of a few extreme observations
    Outliers
    Skewed distribution
    Not a resistant measure of center
    Measuring Center The Median
    Formal version of midpoint
    Median M the number such that half the observations are smaller and the other half are larger
    To find the median of a distribution
    Arrange all the observations from smallest to largest
    observations up from the bottom of the list
    observations up from the bottom of the list to get the location of the median
    Mean vs Median
    Mean and median are most common measures of center
    Mean and median are close together in symmetric distributions
    In skewed distributions the mean is farther out in the long tail
    The most useful numerical description of a distribution gives both a measure of center and spread
    Measuring Spread Range
    Range Max Min
    Shows full spread of data
    Dependent on smallest and largest observations may be outliers
    Measuring Spread The Quartiles
    Arrange observations in increasing order and locate the median
    First quartile 25th percentile median of the lower 50 of observations
    Third quartile 75th percentile median of the upper 50 of observations
    The Five Number Summary and Boxplots
    Consists of the smallest observation min the first quartile Q1 the median M the third quartile Q3 and the largest observation max
    Visual representation of the five number summary
    The 1 5 X IQR Rule for Suspected Outliers
    Distance between quartiles IQR is resistant to outliers
    Interquartile range IQR Q3 Q1
    IQR alone is not very useful for describing skewed distributions
    Suspected outliers
    Q3 1 5 X IQR
    Q1 1 5 X IQR
    Technology Toolbox pp 81 82
    Measuring Spread The Standard Deviation
    Variance the average of the squares of the deviations from the mean
    How far the observations are from their mean
    n observations x1 x2 xn

    Standard deviation
    the square root of the variance

    Properties of the Standard Deviation
    s measures spread about the mean use only when mean is used as measure of center
    s 0 when there is no spread all observations have the same value
    s is not resistant
    a few outliers can make s very large
    s2 makes this measure even more sensitive to a few extreme observations
    Choosing Measures of Center and Spread
    For skewed distributions or distributions with strong outliers use the five number summary
    For reasonably symmetric distributions that are free of outliers use mean and standard deviation
    Always plot your data
    Graphs give best overall picture of a distribution
    Numerical measures of center and spread only give specific facts do not describe its entire shape
    Changing the Unit of Measurement
    Linear Transformations

    Adding a shifts all values of x up or down
    Multiplying by b changes the size of the unit of measure
    Comparing Distributions
    Data
    Who
    What
    Why
    When where how and by whom
    Graphs
    Numerical Summaries
    Interpretation Answer the question
    Chapter 1 Exploring Data
    Mrs S Smith September 2008 Page PAGE MERGEFORMAT 1