Technical and Statistical Issue papers
Skewness and
Correlations
Prorating Errors in
Test Scores
Standard Deviation near
the Mean Value
Assumptions in Least
Squares Regression
Percentiles and Percentile
Ranks - confused or what?
Calculating 2 x 2 decision
table indices from summary information
Correlation
attenuation due to Likert Categorization
Interrater Reliability: Definitions, Formulae, and Worked
Examples
The Relationship between variables, cases, and factor
stability
Euclidean Distance: raw, normalised, and double-scaled
coefficients
KR-20 Cronbach Alpha, and dichotomous-response
items
Skewness
and Correlations. Ever wondered what happens
with Pearson product moment correlations when the skewness coefficient for one
variable departs from 0.0? Well, this short document provides both a graphical
and computational answer - varying skewness from 0.0 through ± 6.0. Although not
definitive (since only one simulation dataset was run), it is instructive to see
why Kendall and Stuart (1958) originally recommended being very cautious of
correlations in which skewness values > 2.0 were observed. Download the Acrobat pdf
format document here (288k).
Prorating
error in Test Scores. This was a fairly hefty examination (33 pages of
results and analysis!) of the kinds of errors incurred when prorating test
scores (instead of using multiple imputation for missing data analysis, we
adjust the the test score in proportion to the number missing item response). As
part of the treatment-outcome effectiveness analysis of the Anger Management
Project (Prof. Ray Novaco, Mark Ramm, Val Woods), it became clear that we had a
problem with missing data on some of the psychometric and nursing observation
assessment scales. For various reasons (some of which are discussed in the
document), I did not choose to use multiple imputation methods such as EMCOV,
Amelia, or NORM. Instead, we thought to prorate the scale scores - as also
suggested in the PCL-R manual. However, the question arose "How many items
should we prorate?" To help answer this, Charles Marley (an associate
psychologist at the State Hospital) set about examining a set of data, that
contained 9 psychological test scales (drawn from some of the Anger Management
project assessments), varying in size from 24 through to just 4 items in length.
Alphas for all scales were good to excellent. All patients in the file answered
all items on all these scales. What we (note the Royal "we" -
actually it was Charles who completed the entire analysis and bulk of the
report!!) did then was to randomly "exclude" items from each scale, in
steps of 5% missing (or the nearest whole item) up to 50% of items missing. For
each bound, we computed the raw unadjusted and prorated score, the difference
and correlations between these and the total "actual score), and the
distributions of errors for each prorated score. The final result - prorate 15%
at maximum, given the properties of the kinds of test scales being used in
the Anger Management project (Novaco Anger Scales, Beck Depression Inventory,
State-Trait Anger Inventory etc.). Download the Acrobat pdf format
document here (946k).
Standard
Deviation near the Mean for a set of scores. This was examined in response
to a query from a colleague in Kuwait. He had been advised by a statistician
that "according
to the normal curve, the SD must be in the limits of the third of Mean
and not exceed it".
I was curious myself - and so took a look at this issue - with some simulations
and explorations of the statistician's advice. This was my response ..Download
the Acrobat pdf format document
here
(327k).
Assumptions in
Least Squares Regression. This was in response to a
question asked of me ... "I
am starting to use the some regression analysis and I am a bit confused about
the Assumptions. About
normality and homoscedasticity, what exactly I need to test? The real variables
or just the residuals? or both?" My response began ...
There is one important assumption for the use of least-squares, linear
regression that is generally phrased as
"The
population means of the values of the
dependent variable
Y
at each value of the independent variable
X
are
assumed to be on a straight line".
This statement implies that at each value of X, there is a distribution of Y
values for which the mean is used as the value that characterises the average
value of each Y at X. This immediately implies that Y itself is a random
variable, possessing equal-interval, additive concatenation units (the use of
the mean implies additivity of units).
A further set of assumptions that are also made when using linear regression
are (taken from Pedhazur, 1997, pp. 33-34)
1. The mean of the errors (residuals (Yik-Yik'))
for each observation of the Yi on
Xi, over many replications, is zero.
2. Errors associated with one observation of Yi on Xi are
independent of errors associated with any other observation Yj on Xi
(serial autocorrelation)
3. The variance of the errors of Y, at all values of X, is constant (homoscedasticity)
4. The values of the errors of Y are independent of the values of X.
5. The distribution of errors (residuals) over all values of Y are
normally distributed
From the above, there seems to be no a priori requirement for Y itself to be normally distributed. It seems that the assumption noted above (in green) could be met by a variable whose values are, for example, uniformly distributed rather than normally distributed. The normality assumption seems to be confined explicitly to the errors of prediction of Y, not Y itself. In fact, many textbooks only mention the assumptions within this framework. ... I then generated some appropriate data and set about testing each of the assumptions with uniformly and normally generated/distributed data. Plenty of graphics and discussion!
(17th August, 2005) ... The file has been updated with an addendum thanks to Dr S.A. Butler, Corus Research, Development and Technology, Swinden Technology Centre, Rotherham, South Yorkshire ..."Unfortunately, some people will insist on using Excel for statistical work even when much better software is available to them, so I have recently had to look at the regression facilities available in Excel. I discovered that, when regression is carried out via Tools / Data Analysis / Regression, there is an option to produce a Normal Probability Plot, but this is a plot of the Y-values, NOT the residuals. " Download the Acrobat 7 pdf format document here (1.08Mb).
Percentiles and Percentile Ranks - confused or what?
This document tries to explain the basis for two almost equally occurring
definitions of percentiles as either:
A percentile is the point in a distribution at or below which a
given percentage of scores is found
-or-
The value below which P% of the values fall is called the Pth
percentile
22 annotated definitions of percentiles are scoured from books and the internet
to list those which define them using either of the definitional statements
above. Then some logic and worked examples are used to show how both definitions
are correct, given a perspective of a test score as an integer, or as a
point-estimate of a continuous real-valued number. It is hoped this makes things
much clearer. This document was written as a response to the question above that
arose recently in a discussion with some test publishers. The document can be
downloaded as an Acrobat pdf file
here
(70k)
Calculating 2 x 2
decision table indices from summary information.
Given a Base Rate, Sensitivity, and Specificity coefficient, find the
constituent cell proportions within a 2 x 2 table such that any other relevant 2
x 2 table statistics may be computed from them. The one-page document can be
downloaded as
pdf format item, or a
MathCad 11 Spreadsheet for
those who use MathCad. Don't forget to download DICHOT 3 from my
software page if you want a free Windows program
that calculates a host of decision-table statistics from a 2 x 2 data table.
Correlation
Attenuation due to Likert Categorization. Here I asked three questions
...
·
What is the likely attenuation on a correlation
between two items on a test, if we use a 2-choice rather than say a 3, 4, 5, up
to a 9-choice Likert format?
·
What happens to the item variances under these
conditions?
·
Would using more Likert categories
automatically increase alpha reliability for a test?
To answer these, I put the issue of psychological meaning to one side (i.e. what
happens to a person’s judgement when it is constrained to a 2, 3, 4, 5, 6, 7, 8,
or 9-category rating scale), and concentrated solely on the
mathematical-statistical issue. Specifically, I was curious myself as to
what happens exactly when say real-valued continuous number responses are
categorised.
This document implements a simulation very much like that of Bollen and Barb
(1992) in order to answer the three questions above. However, I go a few steps
further in exposing what happens to the standard deviations of the categorized
variables, and how this can affect coefficient alpha and the standard error of
measurement (within classical test theory). Of interest may be the presentation
of the formulae for generating correlated bivariate data, and the derivation
difference in the formula used by David Howell and myself. For interested
readers who may have access to STATISTICA 6, I have also included the heavily
documented STATISTICA 6 program (both as a downloadable file and as an appendix
to the word/pdf document) that was used to effect all calculations and
bootstrap sampling. Finally, I have included about 30 references to work in this
area for information purposes. The
pdf document
is 349k, and the STATISTICA 6 svb file is 13k.
Interrater Reliability: Definitions, Formulae, and
Worked Examples.
A Word 2000 document that goes into both conceptual
and computational detail for interrater reliability analysis. This is a revised version (22nd March, 2001)
that incorporates detailed SPSS analysis examples (as well as STATISTICA
examples) for Intraclass correlations as per Shrout and Fleiss Models 1, 2, and
3. Download a pdf version (248k) Rater.pdf.
The Relationship between variables,
cases, and factor stability.
A document which examines some of the evidence and reasoning associated with
"rules of thumb" for determining the numbers of cases for a particular quantity
of variables, so that an "adequate" sample of cases might be established. The
logic for regression and correlation is expanded upon, and the latest evidence
presented from the literature on factor analysis. The conclusion from empirical
analyses within the factor analytic domain is that rules of thumb such as "10
cases to a variable" are fundamentally flawed. The 7-page Acrobat pdf
format document may be downloaded
here.(374k)
Euclidean Distance: Raw, Normalised, and
Double-Scaled Coefficients. Having been fiddling around with
distance measures for some time – especially with regard to profile comparison
methodologies, I thought it was time I provided a brief and simple overview of
Euclidean Distance – and why so many programs like SPSS, SYSTAT, and PRIMER-5,
give so many completely different estimates/derivations of it. This is not because the
concept itself changes (that of linear distance), but is due to the way
programs/investigators either transform the data prior to computing the
difference, normalise constituent distances via a constant, or re-scale the
coefficient into a unit metric. However, few actually make absolutely explicit
what they do, and the consequences of whatever transformation they undertake.
Given that I always use a double-scaling of distance into a unit metric for the
coefficient, and never transform the raw data, I thought it time I explained the
logic of this, and why I feel some of the coefficients used within some popular
statistical programs are sometimes less than optimal (i.e. using “normal
z-score” transformations). The 17-page document is in pdf format, and can be
downloaded here
(298k)
KR-20 Cronbach Alpha, and
dichotomous-response items. There seems to be a little confusion
in some students and academics about which reliability coefficient is the more
correct for dichotomous response questionnaire item scales (e.g. yes-no,
true-false), where the sum-score is the cumulative sum of item responses. This
confusion has manifested itself sometimes in recommendations that for
binary-response items, the Kuder-Richardson 20 (KR-20) coefficient is the more
"appropriate" reliability coefficient. As I show in this document, the formula
for the KR-20 is mathematically equivalent to that for alpha. A simple worked
example with STATISTICA and SPSS output is provided - so that readers might see
exactly how the calculations are made and reported. The 5-page document is in
pdf format, and can be downloaded
here (134k)
![]()