Technical and Statistical Issue papers

 

Skewness and Correlations
Prorating Errors in Test Scores
Standard Deviation near the Mean Value
Assumptions in Least Squares Regression
Percentiles and Percentile Ranks - confused or what?
Calculating 2 x 2 decision table indices from summary information
Correlation attenuation due to Likert Categorization
Interrater Reliability: Definitions, Formulae, and Worked Examples
The Relationship between variables, cases, and factor stability
Euclidean Distance: raw, normalised, and double-scaled coefficients
KR-20 Cronbach Alpha, and dichotomous-response items

Skewness and Correlations. Ever wondered what happens with Pearson product moment correlations when the skewness coefficient for one variable departs from 0.0? Well, this short document provides both a graphical and computational answer - varying skewness from 0.0 through ± 6.0. Although not definitive (since only one simulation dataset was run), it is instructive to see why Kendall and Stuart (1958) originally recommended being very cautious of correlations in which skewness values > 2.0 were observed. Download the Acrobat pdf format document here (288k).

Prorating error in Test Scores. This was a fairly hefty examination (33 pages of results and analysis!) of the kinds of errors incurred when prorating test scores (instead of using multiple imputation for missing data analysis, we adjust the the test score in proportion to the number missing item response). As part of the treatment-outcome effectiveness analysis of the Anger Management Project (Prof. Ray Novaco, Mark Ramm, Val Woods), it became clear that we had a problem with missing data on some of the psychometric and nursing observation assessment scales. For various reasons (some of which are discussed in the document), I did not choose to use multiple imputation methods such as EMCOV, Amelia, or NORM. Instead, we thought to prorate the scale scores - as also suggested in the PCL-R manual. However, the question arose "How many items should we prorate?" To help answer this, Charles Marley (an associate psychologist at the State Hospital) set about examining a set of data, that contained 9 psychological test scales (drawn from some of the Anger Management project assessments), varying in size from 24 through to just 4 items in length. Alphas for all scales were good to excellent. All patients in the file answered all items on all these scales. What we (note the Royal "we" - actually it was Charles who completed the entire analysis and bulk of the report!!) did then was to randomly "exclude" items from each scale, in steps of 5% missing (or the nearest whole item) up to 50% of items missing. For each bound, we computed the raw unadjusted and prorated score, the difference and correlations between these and the total "actual score), and the distributions of errors for each prorated score. The final result - prorate 15% at maximum, given the properties of the  kinds of test scales being used in the Anger Management project (Novaco Anger Scales, Beck Depression Inventory, State-Trait Anger Inventory etc.). Download the Acrobat pdf format document here (946k).

Standard Deviation near the Mean for a set of scores. This was examined in response to a query from a colleague in Kuwait. He had been advised by a statistician that "according to the normal curve, the SD must be in the limits of the third of Mean and not exceed it". I was curious myself - and so took a look at this issue - with some simulations and explorations of the statistician's advice. This was my response ..Download the Acrobat pdf format document here (327k).

Assumptions in Least Squares Regression. This was in response to a question asked of me ... "I am starting to use the some regression analysis and I am a bit confused about the Assumptions. About normality and homoscedasticity, what exactly I need to test? The real variables or just the residuals? or both?" My response began ...
There is one important assumption for the use of least-squares, linear regression that is generally phrased as


"The population means of the values of the dependent variable Y at each value of the independent variable X are assumed to be on a straight line".

This statement implies that at each value of X, there is a distribution of Y values for which the mean is used as the value that characterises the average value of each Y at X. This immediately implies that Y itself is a random variable, possessing equal-interval, additive concatenation units (the use of the mean implies additivity of units).

A further set of assumptions that are also made when using linear regression are (taken from Pedhazur, 1997, pp. 33-34)
1. The mean of the errors (residuals (Yik-Yik')) for each observation of the Yi  on Xi, over many replications, is zero.
2. Errors associated with one observation of Yi on Xi are independent of errors associated with any other observation Yj on Xi (serial autocorrelation)
3. The variance of the errors of Y, at all values of X, is constant (homoscedasticity)
4. The values of the errors of Y are independent of the values of X.
5. The distribution of errors (residuals) over all values of Y are normally distributed

From the above, there seems to be no a priori requirement for Y itself to be normally distributed. It seems that the assumption noted above (in green) could be met by a variable whose values are, for example, uniformly distributed rather than normally distributed. The normality assumption seems to be confined explicitly to the errors of prediction of Y, not Y itself. In fact, many textbooks only mention the assumptions within this framework. ... I then generated some appropriate data and set about testing each of the assumptions with uniformly and normally generated/distributed data. Plenty of graphics and discussion!

(17th August, 2005) ... The file has been updated with an addendum thanks to Dr S.A. Butler, Corus Research, Development and Technology, Swinden Technology Centre, Rotherham, South Yorkshire ..."Unfortunately, some people will insist on using Excel for statistical work even when much better software is available to them, so I have recently had to look at the regression facilities available in Excel. I discovered that, when regression is carried out via Tools / Data Analysis / Regression, there is an option to produce a Normal Probability Plot, but this is a plot of the Y-values, NOT the residuals. "   Download the Acrobat 7 pdf format document here (1.08Mb).

Percentiles and Percentile Ranks - confused or what? This document tries to explain the basis for two almost equally occurring definitions of percentiles as either:
A percentile is the point in a distribution at or below which a given percentage of scores is found
-or-
The value below which P% of the values fall is called the Pth percentile
22 annotated definitions of percentiles are scoured from books and the internet to list those which define them using either of the definitional statements above. Then some logic and worked examples are used to show how both definitions are correct, given a perspective of a test score as an integer, or as a point-estimate of a continuous real-valued number. It is hoped this makes things much clearer. This document was written as a response to the question above that arose recently in a discussion with some test publishers. The document can be downloaded as an Acrobat pdf file here (70k)

 Calculating 2 x 2 decision table indices from summary information. Given a Base Rate, Sensitivity, and Specificity coefficient, find the constituent cell proportions within a 2 x 2 table such that any other relevant 2 x 2 table statistics may be computed from them. The one-page document can be downloaded as pdf format item, or a MathCad 11 Spreadsheet for those who use MathCad. Don't forget to download DICHOT 3 from my software page if you want a free Windows program that calculates a host of decision-table statistics from a 2 x 2 data table.

 Correlation Attenuation due to Likert Categorization. Here I asked three questions ...
·          What is the likely attenuation on a correlation between two items on a test, if we use a 2-choice rather than say a 3, 4, 5, up to a 9-choice Likert format?
·          What happens to the item variances under these conditions?
·          Would using more Likert categories automatically increase alpha reliability for a test?
To answer these, I put the issue of psychological meaning to one side (i.e. what happens to a person’s judgement when it is constrained to a 2, 3, 4, 5, 6, 7, 8, or 9-category rating scale), and concentrated solely on the mathematical-statistical issue. Specifically, I was curious myself as to what happens exactly when say real-valued continuous number responses are categorised. This document implements a simulation very much like that of Bollen and Barb (1992) in order to answer the three questions above. However, I go a few steps further in exposing what happens to the standard deviations of the categorized variables, and how this can affect coefficient alpha and the standard error of measurement (within classical test theory). Of interest may be the presentation of the formulae for generating correlated bivariate data, and the derivation difference in the formula used by David Howell and myself. For interested readers who may have access to STATISTICA 6, I have also included the heavily documented STATISTICA 6 program (both as a downloadable file and as an appendix to the word/pdf document)  that was used to effect all calculations and bootstrap sampling. Finally, I have included about 30 references to work in this area for information purposes.  The pdf document is 349k, and the STATISTICA 6 svb file is 13k.   

Interrater Reliability: Definitions, Formulae, and Worked Examples. A Word 2000 document that goes into both conceptual and computational detail for interrater reliability analysis. This is a revised version (22nd March, 2001) that incorporates detailed SPSS analysis examples (as well as STATISTICA examples) for Intraclass correlations as per Shrout and Fleiss Models 1, 2, and 3. Download a pdf version (248k) Rater.pdf.

  The Relationship between variables, cases, and factor stability.  A document which examines some of the evidence and reasoning associated with "rules of thumb" for determining the numbers of cases for a particular quantity  of variables, so that an "adequate" sample of cases might be established. The logic for regression and correlation is expanded upon, and the latest evidence presented from the literature on factor analysis. The conclusion from empirical analyses within the factor analytic domain is that rules of thumb such as "10 cases to a variable" are fundamentally flawed. The 7-page Acrobat pdf format document may be downloaded here.(374k)

Euclidean Distance: Raw, Normalised, and Double-Scaled Coefficients. Having been fiddling around with distance measures for some time – especially with regard to profile comparison methodologies, I thought it was time I provided a brief and simple overview of Euclidean Distance – and why so many programs like SPSS, SYSTAT, and PRIMER-5, give so many completely different estimates/derivations of it. This is not because the concept itself changes (that of linear distance), but is due to the way programs/investigators either transform the data prior to computing the difference, normalise constituent distances via a constant, or re-scale the coefficient into a unit metric. However, few actually make absolutely explicit what they do, and the consequences of whatever transformation they undertake. Given that I always use a double-scaling of distance into a unit metric for the coefficient, and never transform the raw data, I thought it time I explained the logic of this, and why I feel some of the coefficients used within some popular statistical programs are sometimes less than optimal (i.e. using “normal z-score” transformations). The 17-page document is in pdf format, and can be downloaded here (298k)

KR-20 Cronbach Alpha, and dichotomous-response items. There seems to be a little confusion in some students and academics about which reliability coefficient is the more correct for dichotomous response questionnaire item scales  (e.g. yes-no, true-false), where the sum-score is the cumulative sum of item responses. This confusion has manifested itself sometimes in recommendations that for binary-response items, the Kuder-Richardson 20 (KR-20) coefficient is the more "appropriate" reliability coefficient. As I show in this document, the formula for the KR-20 is mathematically equivalent to that for alpha. A simple worked example with STATISTICA and SPSS output is provided - so that readers might see exactly how the calculations are made and reported. The 5-page document is in pdf format, and can be downloaded here (134k)