![]()
Psychometrics and Measurement
Beyond Psychometrics: the recovery of a standard
unit of length. (November, 1998)
Beyond
Psychometrics: the strategic implications for occupational psychology
(March, 1999)
Applicant
vs Non-Applicant Data - Part 1- (Rosalie Brown and myself,
BPS TUC 99 conference, June 1999)
Applicant
vs Non-Applicant Data - Part 2- (Rosalie Hutton (ex-Brown!)
and myself, BPS OccPsy conference, Jan. 2000)
The
Role of a Concatenation Unit (BPS Maths-Stats-Computing Meeting, London,
December 2001)
Single
Item/Attribute Psychometrics: can it be done? (NZ I/O society - Auckland
Region, Wednesday 20th February, 2002)
Measurement cannot occur in a theoretical
vacuum (AERA-D April 2002 Symposium Paper)
Personality Assessment via Graphical Profiler
(September, 2003)
Quality and Quantity: The logic of
measurement (Postgrad Class Lecture Note - 2006)
Predictive Accuracy as THE criterion for
organizational research (8th Annual NZ Work Research Conference keynote -
May 2007)
![]()
2007 New Zealand Psychological Society Annual
Conference - Hamilton (Aug 23rd-26th)
Good Judgment,
Intelligence, and Personality
Two Big Ideas
Brunswick
Symmetry, Complexity, & Non-Quantitative Psychology - Tying it all Together
Business-Commercial Psychometrics
Pre-Employment
Integrity Testing: Current Methods, Problems, and Solutions. British Computer
Society Information Security Specialist Group Annual Conference, March, 2001.
The
POP questionnaire - single item psychometrics and 16PF FormA vs 16PF5 (BPS
Conference, Warwick, 1995)
Evidence-Based HR. can it be done
(NZ Psychology Society Conference, 2002)
Single
Item/Attribute Psychometrics: can it be done? (NZ I/O society - Auckland
Region, Wednesday 20th February, 2002)
SIOPSA keynote address "Psychological
Assessment and Data Utility: It's Time to Innovate" (June 2003)
Maximising Business Performance: Using Psychometrics to
Improve Efficiency, Productivity, and Performance (March, 2004)
Validity and Utility in I/O Psychology (May
2005)
The Chinese Challenge to the Big 5 (Tyler et al, 2005)
Individual Differences
Chronometric and Bioelectric Correlates of IQ (December,
1999)
The
String Measure. Evoked Potential Correlate Research, and Psychometric IQ (BPPS,
December 1999)
Quantitative
Science and Individual Differences: candidate models for intelligence and
personality. The ECP10 paper as above, but heavily augmented with about 15
new slides that support my arguments concerning the poor scientific value
of low correlations (September 2000)
Clinical Effectiveness and Outcome
Evaluation
Clinical
Effectiveness: the Rules for Treatment Instantiation and Outcome Evaluation
(Maureen Nicol and myself, April 2000: The State Hospital)
Outcome Measuring Procedures in Secure Settings. Shaftsbury Clinic, Springfield
University Hospital, April 6th, 2001
Forensic Psychology Issues
Decision Table Analysis: Definitions, Methods, and Assessing Risk (November
1999)
Risk
Prediction and Risk Management: obviously not a priority for senior managers in
psychiatry and psychology. Strategy, strategy, strategy! (Psychological
Solutions to Personality Disorder Conference: 14/3/2000)
Factor Score Disparity in the Psychopathy Checklist Revised
(September, 2003)
Methodology
Hypothesis Testing and Power Analysis (September 2000)
Interrater Reliability: Definitions, Formulae, and Worked Examples (March,
2001)
Research Methods for the 21st Century (September,
2005)
Computational Profiling
NZ Psychological Society Conference 2003
- paper and workshop abstracts and downloads (30th August, 2003)
Person-Target Profiling Workshop: Issues in
Matching and Construction (September, 2005)
![]()
Beyond
Psychometrics: the recovery of a standard unit of length:
This 50-slide presentation was given at the British Psychological Society's
Division of Occupational Psychology conference: Assessment in the
Millennium: Beyond Psychometrics, November 1998, at Birkbeck
(University of London). The theme of this presentation was about Rasch scaling,
and its capacity to construct a standard unit from observational data. This
presentation contained a data simulation that attempted to hide a true
quantitatively structured latent variable of length behind some poor ordinal
observations. All the Rasch scaling did was to construct an equal-interval
latent variable of ordinal lengths! This simulation was heavily criticised Ben
Wright and others, and I have included these criticisms as an addendum to the
presentation - along with my reply. However, recent papers seem to have
vindicated my conclusions in some respects. I'm now undertaking a massive
simulation to really hammer home the inability of Rasch or IRT scaling to
recover "true" latent variables. The reality is that these methods simply
construct linear latent variables in complete isolation of any empirical
evidence that such variables might indeed be quantitatively structured.. In my
opinion, from a scientific perspective, these scaling methods are frankly of
little utility, but they are ingenious from a psychometric perspective and do
have great utility in a more pragmatic sense. It all comes down to what the
purpose is for using such scaling, science or number scaling. The
presentation itself may be downloaded
here,
as a pdf file (625k in size).
Beyond
Psychometrics: the strategic implications for occupational psychology: This 44 slide presentation was given at the British Psychological Society's
Division of Occupational Psychology conference: Assessment in the
Millennium: Beyond Psychometrics. This was a second conference on March
5th, at Birkbeck (University of London) again, that repeated the theme from the
first one in November 1998. There are no notes with these slides. Note
the new reference to Salgado (1999) ... a review or personnel selection
methods that augments the work of Schmidt and Hunter (1998) - also referenced in
the presentation. This presentation sets out what may lie beyond the current use of
psychometric measurement and assessment in occupational psychology, in the
millennium. It focuses on both changes in practice and techniques, introduces a
new First Law for future practice, and outlines in some detail, a new kind of
"smart" profiling for candidate choice (whether for promotion, training,
selection, or team-building). This presentation sets out what may lie beyond the current use of
psychometric measurement and assessment in occupational psychology, in the
millennium. It focuses on both changes in practice and techniques, introduces a
new First Law for future practice, and outlines in some detail, a new kind of
"smart" profiling for candidate choice (whether for promotion, training,
selection, or team-building). ALSO:
the PC Windows 95/98 format program that runs examples of Wolfram's
1-dimensional cellular automata is downloadable here. This was the example
program I ran in the presentation. I have augmented it slightly to add in more
rules on an "autoplay" button (alongside the example rule-sets and
their special one-off buttons). All you need to do is download the zipped
installation fileset CA.ZIP (click here).
Use Winzip to unzip or run SETUP.EXE directly from the Winzip archive.
Alternatively, unzip the files into a temporary directory and run the file
SETUP.EXE in this temporary directory. The installation program is completely
automated and creates its own program listing entry and icon.
An Acrobat 4.0 pdf file of the
presentation, containing both slides and notes (as in Powerpoint) is also
available. This file is 241k in size. Click
here
for the pdf file version.
Applicant
vs Non-Applicant data - Part 1 This 32 slide presentation was given at the British Psychological Society's 1999
Test User conference by Rosalie
Brown. This was essentially a major aspect of her recent her MSc thesis
work. I simply helped here and there with a few bits of statistical advice and
one or two analyses. The Acrobat 4.0 pdf version is available
here
(725k).
Applicant
vs Non-Applicant data - Part 2
This 26 slide presentation was given at the British Psychological Society's
January 2000 Occupational Psychology conference by myself and Rosalie
Hutton (surname was Brown) Following on from her
presentation in June 1999 (See Part 1 above), we
extended the research to two new tests, the Psytech International 15FQ and
Saville and Holdsworth's Concept 5.2 OPQ. As the analyses evolved in the second
presentation, it became clear that the issue was confounded by two key problems,
unproven measurement axioms, and a predilection for subjectivity in personality
questionnaire scale score interpretation. A further problem was thrown up when
we attempted to analyse the OPQ - that of questionanires that have no a
priori psychometric structure. The overall conclusion was that, apart from
using some very, very, basic psychometric principles, the kinds of analyses
adopted by us (and others) are simply too powerful given the properties of the
data at hand. Further, we posed the question as to whether any equal-interval
test theory was of any practical or theoretical relevance any more. The Acrobat 4.0 pdf version
is available here
(403k)
Decision Table Analysis: Definitions,
Methods, and Assessing Risk. An
86-slide Powerpoint file - an exposition of 2x2 table analysis for
decision making and ROC analysis that uses the VRAG dataset to show how
researchers use these methods in practice. The file
can be downloaded here as a pdf file (467k)
Decision.pdf. Also for download is:
Roc1.jpg .... a special ROC distribution
graphic - can be opened with your browser, MS Word, or Paintshop Pro etc. It
accompanies the presentation.
Chronometric
and Bioelectric Correlates of Psychometric IQ: these
7 slides (with notes) provide the graphical antidote to those who become
somewhat over-excitable about the measurement/predictive potential of
chronometric and bioelectric indices based upon their correlations with
psychometric IQ. Here, you see exactly what the data look like that underlie the
kinds of -0.5 correlations you sometimes see presented between these measures
and IQ scores. No simple theory of speed or variability can explain these data -
which have been replicated in several experiments. The publications from the
biosignal lab and elsewhere are presented below in the evoked potential
correlate research presentation "Key References" list. The challenge is to
figure out how to make robust, direct, and routine measurement of fundamental
nervous system properties, and basic reaction times. Some good work has taken
place with auditory IT - but precious little in other domains. This is still a
great research area to work in - but the strong theory linking these measures to
a decent theory of intelligence is still sadly lacking - and the biosignal lab
now sits in the Science Museum in the UK! The presentation can be downloaded as
a pdf file (Acrobat 7)
here (177kb)
The String
Measure. Evoked Potential Correlate Research, and Psychometric IQ.
This 21 slide presentation was given at the 1999 British Psychophysiological
Society's Conference, at the Institute of Neurology (13th-15th Dec.), as part of
an excellent symposium on Intelligence and Personality organised and introduced
by Peter Caryl. The presentation is basically a brief exposition of the
rationale, evidence, and my conclusions about this area. Ian Deary and myself
disagree on one fundamental point - the status of theory in this area of
research. Ian made the point about atheoretical genome sequencing a al Craig
Venter's approach. I made the point about Einstein and Theoretical Physics. Yep,
it was that kind of symposium - excellent stuff - and all credit to Peter and
the other speakers (Martha Whiteman, Ian Deary, Andrew MacLullich, and Peter
Caryl) for some really thought-provoking presentations. The Acrobat 4
pdf file can be downloaded
here
(283k). The zipped Powerpoint presentation can be downloaded
here
(314k). There is also an accompanying Key References
document that provides the references to all papers/results that I mentioned in
my presentation. The Acrobat 4 pdf file
of these references can be downloaded
here.
Risk
Prediction and Risk Management: obviously not a priority for senior managers
in psychiatry and psychology. Strategy, strategy, strategy! This 24 slide presentation was given at the UK High Security Psychiatric
Hospitals Conference on Personality Disorder - Leeds, March 14th-15th. Here
I examine the failure of every high security mental health institution in the UK
to implement a coherent, meaningful, organisational-wide strategy for risk since
the VRAG and RAMAS systems were published in 1994/5. I provide an explicit
strategy that should have been (and still might be) implemented immediately -
show how leadership and Goal Directed Management would have achieved results,
and finally dissect the reasons that I see as causing the failure by senior
clinical management to evolve and implement such a strategy. I conclude by
discussing new "hot" areas and the key experts to be consulted by
organisations in this area. The Acrobat 4.0 pdf version is available
here (321k).
Clinical
Effectiveness: The Rules for Treatment Instantiation and Outcome Evaluation. This 102 (yes 102!!) slide presentation was given at the State Hospital to the
Psychology Department during late April, 2000. It takes the 8-step model from
the Clinical Effectiveness presentation Effectiveness
#2 set: Expertise, Therapy, Audit and Evaluation Methods from the clinical
effectiveness series (contact Paul
Barrett if you want this), and tests it against two current psychological
therapies within the hospital. From this evaluation, it then examines two
medical treatments for pneumonia and epilepsy, and evaluates how these meet the
model specifications. Then it contrasts the psychological therapy
"fit" to the model vs the medical model fits. Finally, it tries to
explain why the psychology models do not fit the 8-step model, then defines the
suggested rules for any mental health practitioner who might wish to instantiate
treatment and evaluate it, as part of best-practice development. However, it
concludes that many of the aspirations, by management and some senior
clinicians, toward assessing clinical effectiveness within mental health are
commendable but unlikely to ever be achieved. Instead, certain concrete
suggestions are made that are likely to increase information about effectiveness
whilst retaining a degree of reality with regard to the logistics of their
implementation. The Acrobat 4.0 pdf version is available
here (242k).
Quantitative
Science and Individual Differences: candidate
models for intelligence and personality A heavily augmented version of the Krakow presentation - given to the Dept, pf
Psychology, University of Canterbury. NZ. It consists of 48 slides, and is
downloadable in pdf (2.1Mb) format. A summary of the three major test theories along with some
analysis work. The references document (Word 2000 format) that supports this
presentation is
keyrefs2.pdf.
Hypothesis
Testing and Power Analysis. A 48-slide Powerpoint file - a summary review of the fundamentals of
hypothesis testing and power analysis - with graphics! Download the pdf version
(380k) Hyptest.pdf.
Interrater Reliability:
Definitions, Formulae, and Worked Examples.
A Word 2000 document that goes into both conceptual
and computational detail for interrater reliability analysis. This is a revised version (22nd March, 2001)
that incorporates detailed SPSS analysis examples (as well as STATISTICA
examples) for Intraclass correlations as per Shrout and Fleiss Models 1, 2, and
3. Download a pdf version (248k) Rater.pdf -or-
a zipped Word Document (315k)
Rater.zip ...
Pre-Employment
Integrity Testing: Current Methods, Problems, and Solutions. British
Computer Society Information Security Specialist Group Annual Conference, March,
2001. This presentation consists of two parts - the first is a written
paper which is a fairly detailed description of the methodologies used to assess
an individual's integrity and honesty, and the detection of faking. These range
from the polygraph, P300 AEPs, Hypoactive arousal, interviews, biodata, covert
and overt psychometrics. This paper includes a list of all major advertised
psychometric covert and overt integrity tests (and derived scales), along with
publisher web, email, and address contact details (updated slightly
22-Feb-2005). The second part is the powerpoint presentation. This consists of 38 slides, which discuss in some
detail, the problems with this area (some of which are shared by all
psychometric tests), and a one-page checklist of "solutions" based
upon the results of the analyses and discussions presented earlier in the
presentation (and following the relevant APA guidelines on this issue). Again, I
totally reject the use of validity and correlation coefficients alone as
useful indicators for purchasing decisions. This rejection is based upon
detailed worked quantitative examples in the slides that show that a cost of
$750,000 is "hidden" in one of two validity/correlation coefficients
of 0.33. Further, I show that Ones et al's statements as to "Social
Desirability" being a "red herring" are perhaps more generally
limited to the use of the relatively trivial "social desirability"
scales. Their comments do not apply to purposeful, adaptive faking of critical
psychological attribute tests such as integrity tests. The overall thesis of
these two "offerings" was to enable a potential
purchaser/investigator of such methodologies (and future test developers)
to see just what is absolutely required for the development and use of these
methodologies (in terms of any third party assessing their validity and utility
for each specific application). The theme that I was stressing throughout is
that in this particular domain, cost-benefit analysis of any potential
decision-making strategy is the critical feature of the purchasing decision.
This is why all the corrected validity coefficients in the world are quite
useless except as initial indicators of "potential". A concrete
example was drawn from the forensic area of how to implement integrity testing
using both ordered-categorical "dose-response" analysis, ROC analysis,
and dichotomous outcome analysis - using the VRAG (Violence Risk Assessment
Guide) of Webster, Rice, and Harris, Cormier, and Quinsey). Finally, I concluded
that it was my opinion that this was to be the next major "growth
area" in psychometric testing. [22-Feb-2005
... Remember, this presentation is now 4 years old. Most
of it remains relevant - and I've updated a couple of test publisher address and
contact details - but it does not include reference to "brain
fingerprinting" via fMRI or the latest work on facial muscle
perception/detection. Further, my opinion about this being the next major growth
area in the US/UK markets was wrong. It has in fact, like a lot of psychometrics
these days, just stalled. This is to do with the changing market conditions in
which psychometric tests are now being sold; especially concerning issues
with "talent" shortages, human capital interventions. and the role of
psychometrics as "development" rather then simple screening/assessment
instruments. Further, unlike the lower levels of technical capability required
to use tests of personality and ability within I/O psychology domains, integrity
tests require a high level of technical competence in cut-score/ROI
maximisation/risk-assessment and decision-support statistics].
The Downloads...pdf format files
Integrity_doc.pdf ... 51k
Integrity_tables.pdf ... 139k (
22-Feb-2005)
Integrity_ppt.pdf ... 1.9Mb
Outcome
Measuring Procedures in Secure Settings. Shaftsbury Clinic,
Springfield University Hospital, April 6th, 2001. This is a 24-slide
presentation that details the current impossibility of implementing organisational-wide
treatment-outcome evaluations as part of a coherent clinical effectiveness
strategy within prisons and secure mental health settings. A desired model for
setting up outcome measuring procedures is first outlined, along with the
Barrett 8-step model for general treatment-outcome specifications. Then, the
very real difficulties and the implicit assumptions sometimes made by those from
"management" and "effectiveness departments" who speak on
these issues are exposed. Finally, the reasons for failure of (any?) major UK
"secure" organisation to tackle systematic treatment-outcome measuring
procedures at a system-wide level is discussed. It is concluded that the
fault with the lack of implementation lies not with the clinicians and
practitioners who work in these institutions, but with senior (board-level)
management and the relevant government ministries. What is required is strategic thinking, planning, adequate resource
input, and leadership. The presentation can be downloaded as either a
zipped powerpoint file (outcome.zip) 343k, or as an
Acrobat 4.0 file (outcome.pdf) - 364k. For those
unable to use WinZip or Acrobat in their institutions (due to the new NHS IT
Guidelines being implemented in certain institutions), I've included the MS
Powerpoint 2000 file (Office 97 compliant - click
here)
for download (404k).
The POP questionnaire - single item psychometrics and 16PF FormA
vs 16PF5. This 60-slide dual presentation was given by myself and
Laurence Paltiel at the British Psychological Society (BPS) Occupational
Psychology conference in January 1995. It should be read with several health
warnings - not least by new staff at Saville and Holdsworth (SHL) and ASE-NFER!
Behind all the noise and drama that took place during and after these
presentations - (and those in the audience will know just how much there was!!),
there are several serious issues that are worthy of consideration. Although I
would modify several slides now to better reflect my own growing experience and
reflection after 6 years, the bulk of the presentations still stand as valid
(for me at least). Not least of which is the notion first explored here and
fully reported in a subsequent paper, that of direct construct measurement using
a single item to reflect a narrow construct. There was a "lively" debate on this
issue with Helen Baron, Peter Saville, and George Sik of SHL in the British
Journal of Occupational and Organisational Psychology, and in the BPS I/O
practitioner journal Selection and Development Review. As to the 16PF section -
well, again, there are some very interesting observations made here that do
cause one to pause moment and consider what is meant by "evolution" of a test -
and how far one might evolve a test beyond which it loses continuity with an
older version. In a sense, the papers by myself and Rosalie Brown (nee Hutton)
Applicant
vs Non-Applicant Data address this issue from a tangential perspective -
but I do find these issues interesting from a "what is really going on here"
perspective. Since I now publish with ASE as a test author, am on friendly terms
with IPAT, and have shared a chat and joke or two since with SHL(!) directors in
the recent years ... I hope they appreciate that my intention with such a
presentation was never to insult or denigrate - but to open up as far as
possible, all the material necessary to enable independent-minded individuals to
critically evaluate my propositions and conclusions. This was a view (and still
is) shared by Laurence Paltiel, the co-author. The reason for posting this
presentation is because it was mentioned in a presentation on the Mariner7
Talent Engine that I gave recently. The presentation is available as a .pdf file only (about 750k) -
and is downloadable here. The published papers relevant to the OPQ/16PFpresentations
are:
Barrett, P.T. and Paltiel, L. (1995). Reductio ad Absurdum? A reply to Saville
and Sik (1995). Selection and Development Review, 11, 6, 3-5
Barrett, P.T., Kline, P., Paltiel, L., and Eysenck, H.J. (1996). An evaluation
of the psychometric properties of the Concept 5.2 OPQ Questionnaire. Journal of
Occupational and Organisational Psychology (JOOP), 69, 1, 1-19
Barrett, P.T. and Paltiel, L. (1996).
Can a single item replace an entire scale?
POP vs the OPQ 5.2. Selection and Development Review, 12, 6, 1-4
Barrett, P.T. and Hutton, R. (2000) Personality and Psychometrics. Selection and
Development Review, 16, 2, 5-9
The
Role of a Concatenation Unit. The 41-slide presentation comprised the
guest lecture at the 2001 British Psychological Society's Mathematics,
Statistics, and Computing Section meeting in London. It is concerned with the
role such a unit plays in the kinds of statements one might wish to make about
the causal relations between phenomena and variables. To this end, it explores
the investigative process within science, the nature of "phenomena detection"
(Brian Haig), and the approaches that can be adopted for unit creation. It then
questions whether adopting an axiomatic approach to variable/scale construction
is of any practical value to both applied and fundamental scientific psychology.
Views are noted from Roderick McDonald, Peter Schonemann, Joel Michell, Michael Maraun,
and William Fisher Jnr. It is concluded that what is crucial for advances within
individual differences psychology [at least] is normatively specified technical
constructs that form the basis for single empirically established metric scales,
constructed using the axioms of additive conjoint measurement. An example of
where this has been successfully achieved is the Lexile Scale for reading
proficiency. There is no guarantee that any constructs within Individual
Differences can be scaled into a single metric - however, the hypothesis that a
variable might possess quantitative structure is testable. The zipped
powerpoint presentation is available
here
(387kb). The Acrobat version (333kb) is available
here. The "take home" message is clear "if
we are dealing with variables that possess an additive-unit quantitative
structure, yet do not take advantage of this, then no causal explanation for
phenomena can ever attain a level of explanation beyond ordinal-level
relations". It is now the aim of the author to attempt to systematically create
a metric scale for 'g', in conjunction with the setting forth of a technical
definition of the constituent properties of 'g' (which means no more than
"tidying up" Spearman's and Jensen's statements into a clear, normative,
technical set of meaning statements).
Evidence-Based HR.
Can it be done? This was the last 45-slide presentation of a 2002 workshop
I presented on business psychometrics and HR. It doesn't come much more brutal than this. However, the aim of this
"brutality" is to try and expose the bare essentials of my position - and also
expose what may be my own loose thinking to others. This is not a presentation
that in any way tries to provide simple answers to what is essentially (for me)
a very complex problem. Yes, I don't think HR "gurus" have actually done much at
all for this area except line their own pockets with money, and leave their
profession with a continuing headache. Neither do I feel that psychologists
(including Schmidt, Hunter, and Cascio et al) who have espoused utility theory,
meta analysis, or simple cost-benefit models as a form of
"take-it-from-me-evidence-base" have done much to help the profession solve this
problem either. So - here, I try to provide an initial framework for at least a
clear and solid target for debate. I am starting work on the computational
modelling issue directly - as my contribution to finding what I think is the
only possible solution to the complexity of the problem. I don't know if this is
going to work, or even if it is possible - but I am at least having to deal
directly with reality rather than some idealized simplistic nonsense such as
"average standard deviation performance in dollar value". The argument here is
not so much that utility analysis is a bad thing - but rather that it is
irrelevant to an evidence-based HR - as it does not constitute evidence in and
of itself. Ok - you get the picture! Download the
pdf version (847k) or a
zipped powerpoint (445k) version
Single
Item/Attribute Psychometrics: can it be done? This 20-slide
presentation with notes pages was presented to the NZ I/O
society Auckland Region group, on Wednesday 20th February, 2002. Mariner7's new
product "Talent Engine" was featured, with especial regard to how it works, and
the many issues concerned with psychometric test theory and measurement within
such a system, not least of which are the key issues of reliability and
validity. Talent Engine is the first in a line of products to be built around
Mariner7's on-line assessment technology for acquiring measurements of
psychological attributes. The next big step for Mariner7 is developing online
(or computer-based) tests for personality and values, based around its profiler
assessment technique. A further development is also being contemplated within
the clinical and medical domains of interest, where patient response to
questionnaire measures is constrained by particular cognitive deficits. The
presentation was really concerned with the many implications for psychological
assessment in using the preference profiler technology. There are some handy
references (maybe!) that support some of my reasoning concerning the use of
work-preferences as a primary selection information source, along with the role
of social desirability, personality and job performance, and meta analysis in
general. Hopefully, the presentation will succeed in stimulating the thinking of
the reader - academic and practitioner alike - as to what is the limit of this
technology, and whether it confuses or enhances psychological measurement. The
presentation is available as a
pdf file (608k)
download, or as a zipped powerpoint (XP/2000
format) presentation (761k download), so that you can get at the original
slides and text notes pages if you wish.
Measurement
Cannot Occur in a Theoretical Vacuum This 32-slide
presentation is being presented on my behalf (by Trevor Bond) to the
AERA-D Rasch
Measurement SIG. New Orleans, USA, 3rd April,
2002 at this years Educational Measurement conference. I was meant to have been
attending personally, but funding was unable to be obtained by myself for the
trip from NZ. The symposium is a humdinger though - I really wanted to make this
one! Anyway, my
presentation is available as a
pdf
file (404k)
download, or as a zipped
powerpoint (XP/2000
format) presentation (266k download). It also contains the Rozeboom
paradox - which concerns the failure of additive concatenation of a standard
unit - which serves notice on the role of meaning in measurement!
Symposium Title:
Educational Measurement: Is it
really possible?
Symposium Abstract:
This symposium addresses a key conceptual flaw in
quantitative educational research by asserting that educational researchers
generally do not comprehend the requirements of scientific measurement, or,
where they do, think that such measurement is not actually possible in the human
sciences. While the historical precedents of our
misunderstanding might be better understood in hindsight, this symposium
suggests that that the principles of measurement are not only relatively
straightforward but in fact essential if edumetrics is to develop beyond its
current methods of actuarial and statistical score-mark classifications. The
benefits of establishing fundamental measurement procedures for educational
variables extend far beyond the idiosyncrasies of the current statistical
approach to educational outcome indicators. While Rasch analysts in particular,
often claim to address the issue of genuine scientific measurement, the
inadequacies of current goodness of fit techniques and the importance of
substantive theory are seen as key issues for resolution. The symposium
presenters aim to participate in genuine dialogue with the audience and to
canvass these issues that are considered critical to our discipline.
Chair and Discussant:
Trevor Bond
(Trevor.bond@jcu.edu.au)
with Joel Michell, William Fisher, George Karabatsos, and Paul Barrett as
paper presenters.
Factor Score Disparity in the Psychopathy Checklist Revised. (Carlin, Barrett, and Gudjonsson).
A paper given at the NZ Annual Psychology Conference at Massey University,
Palmerston North - (August 30th - September 3rd). The Hare Psychopathy Checklist-Revised (PCL-R) is a 20-item
interview/records-based evaluation tool used by many clinical and corrections
practitioners to assess propensity for Psychopathic attributes within an
individual. The PCL-R is now the primary instrument for such clinical-forensic
evaluations. Three scores may be generated using this instrument: a total score,
a Factor-1 (F1) score which assesses Interpersonal characteristics, and a
Factor-2 (F2) score, which assesses Social Deviance. Recent legal cases in the
UK have highlighted an interesting phenomenon, where a defence and prosecution
expert-witness psychologist have assessed the same individual quite differently,
and where the score disparity between Factors 1 and 2 appears extremely large.
The primary purpose of this study is to compute the probability of observing a
score between 0 and 16 on F1 conditional upon each score level (0-20)
of F2 (Social Deviance: 2003 version scoring),
and vice versa. Further, the conditional distributions of Factor score disparity
relative to the Total PCL-R score will also be calculated. The method for
achieving this will depend on how the data are distributed for each factor.
Following the PCL-R test-manual conventions, using a Pearson correlation between
the two score sets, and assuming normally distributed scores for Factors 1 and
2, it is possible to generate sufficient artificial bivariate data to create a
population-size distribution of expected frequencies for a given relationship
between Factors 1 and 2. If, however, empirical evidence indicates that the two
score distributions are not normally distributed it will be necessary to use
resampling analysis using the empirical joint bivariate distributions.
The datasets are from 1358 offenders at
three UK prisons, and 217 patients from within
two of the four UK high security forensic psychiatric
hospitals and a high-security clinic.The presentation
can be downloaded
here in pdf format.
It consists of 30 slides, and is 414kb in size.
Personality Assessment via Graphical Profiler (Barrett and Ebbeling)
One hundred university students were administered a 106-item
questionnaire that assessed 10 of the 45 facets of Goldberg’s AB5C Five Factor
Model Personality Questionnaire. Each student also completed a new prototype of
a computer-administered personality assessment that utilised a one-dimensional
graphical profiler methodology pioneered by the first author. The 10 facets and
106 questionnaire items were reduced to just 10 single rating statements, with
responses made using positioning by a computer mouse of each facet name onto a
non-quantitative rating scale bounded by two phrases “Most Like Me” and “Least
Like Me”. Each participant was also asked which method of assessment they
preferred, and which one seemed to allow them to best represent their
personality via self-report. Scores acquired from both methods of assessment
were compared to one another for direct equivalence, along with analyses that
examined the participants’ use of the non-quantitative rating scale. Results
indicated non-equivalence of assessment method scores, but the majority of
participants rated the profiler as the optimal method by which they felt they
could describe their personality. An unusual research question for personality
psychometrics has now been raised …“which is the most accurate method of
assessment of an individual’s personality characteristics?”. The presentation can
be downloaded here in pdf format. It
consists of 33 slides, and is 374kb in size.
The South African SIOPSA keynote address that supports the
Personality Assessment paper and provides references and more detail concerning
the measurement issues, as well as introducing more assessment innovations
currently "in hand". It was entitled:
"Psychological Assessment and Data utility: It's time to innovate". The presentation can be
downloaded here in pdf format. It
consists of 69 slides, and is 842kb in size. The presentation used some
AVI, ancillary powerpoint, and computer programs - which are available on the NZ
workshop CD. Due to the size of the files (almost 200Mb), it is not
feasible to distribute them on the web-page. If you would like a workshop CD,
please email me.
Abstract
With Joel Michell’s (1997) publication of the
paper in the British Journal of Psychology entitled “Quantitative Science and
the Definition of Measurement in Psychology”, the methods and procedures of
classical and modern psychometrics were shown to be somewhat at odds with the
axioms and logic of quantitative scientific measurement. Kline (1998) published
the first serious response to this challenge that faced “quantitative
psychologists”, concluding that Michell was indeed correct in his arguments.
Barrett (2003),has since presented the key arguments, measurement axioms, and what appear to be
the obvious logical outcomes of both Michell’s and Kline’s arguments. However, I
do not wish to dwell unduly on these matters of the philosophy of science and
the theory of quantitative measurement. Instead, I would like to show what can
happen to psychological assessment and the use of “test scores” when the strict
adherence to test construction using psychometric test theory is replaced by a
problem-focussed approach that is guided both by more straightforward
measurement concerns and a greater regard for the meaning of what it is that an
investigator might be trying to assess.
I wish to present four applications which form the basis of my current research program, as exemplars of the kinds of new approaches to applied psychological assessment now being developed. These are: The Mariner7 Graphical ProfilerTM, the Psytech International Programmer Potential TestTM, the Smart Profiling logic and algorithms for employee profiling and selection, and Psytech’s Intelligent Psychometrics module embedded within its GeneSysTM system. The first two of these applications embody an entirely new approach to assessment of individual attributes and already exist as commercial products; the first uses computer-based single item attribute measurement in one and two-dimensions in the domains of Work Preferences and Personality. The second assesses the potential of low-technical/educational literacy applicants for “trainability” as computer programmers and systems analysts by measuring the applicant’s on-line behavioural interactions with a computer-based “tool-world” that allows them to construct and run programs to solve several kinds of problems. The third, which is still being developed for commercial application, is an attempt to move beyond current self-imposed statistical and logical constraints on “the profiling” of applicants against targets, in favour of providing solutions and algorithms that use all information considered relevant to the construction of a “target profile”. The notion of a conventional homogenous group target profile is also redefined. The fourth of these developments is the GeneSysTM Intelligent PsychometricsTM module; an expert-system module that allows users to examine the psychometric adequacy of their own questionnaire data, as well examine compliance with adverse impact legislation for any selection decisions based largely upon these data. The user only need ask questions such as “is my test reliable?” or “is my test biased?”, and the system responds by computing all necessary procedures required to answer such questions, with full narrative explanation of exceptions and analysis findings.
At first glance, these four applications seem somewhat disparate and hardly the constituents of a systematic program of research and development. However, I would propose that they are indicative of just such a program, albeit more broad than most. Two strands compose this program: the first is an attempt to produce more accurate assessments of individual attributes by adopting a direct and somewhat different approach to self-report measures of stylistic attributes (personality, values, interests, and preferences), as well as measuring abilities and “potential” via direct behavioural observation instead of questionnaire items. The second is concerned with making much more productive and intelligent use of person-related data, whether from conventional psychometric tests, behavioural observation, work-history, or biodata. In some respects, one only has to look at what is happening to integrity measurement (moving from simple questionnaires to online non-invasive facial muscle feature detection) to see that my own research program is just another example of how the entire field of psychological assessment is beginning to change. This is also becoming apparent in high-stakes clinical assessment for recidivism-risk, psychopathy, and sex-offenders.
Perhaps the quote from Robert Sternberg and Wendy Williams (1998) can be said to sum up the current position to which my own research program is responding …”No technology of which we are aware- computers, telecommunications, televisions, and so on- has shown the kind of ideational stagnation that has characterized the testing industry. Why? Because in other industries, those who do not innovate do not survive. In the testing industry, the opposite appears to be the case. Like Rocky I, Rocky II, Rocky III, and so on, the testing industry provides minor cosmetic successive variants of the same product where only the numbers after the names substantially change. These variants survive because psychologists buy the tests and then loyally defend them (see preceding nine commentaries, this issue). The existing tests and use of tests have value, but they are not the best they can be. When a commentator says that it will never be possible to improve much on the current admissions policies of Yale and its direct competitors (Darlington, 1998, p. 572, this issue), that is analogous to what some said about the Model T automobile and the UNIVAC computer".
That there is
some utility in the use of current conventional psychometric questionnaires is
not in doubt. What is in doubt is whether an order-of-magnitude improvement in
accuracy and utility is to be made by using more sophisticated psychometric
test-theory models to analyse item-response data, or whether such improvement
will be more likely attained by rethinking and constructing entirely new methods
of psychological assessment and data analysis. My research and thinking favours the latter.
Maximising
Business Performance: Using Psychometrics to Improve Efficiency, Productivity,
and Performance. A 112-slide presentation - with notes for
many of the slides containing argument, references, and web-links where
relevant.
The
purpose of this presentation is to see where psychometrics might have some
really substantive financial utility for a business. Rather than focus on the
small-scale implementation issues – everything I’m trying to discuss here is set
around making a big impact on a company’s workforce composition, with a
corresponding big impact on the financial bottom-line. If you attract and hire
the best people for a chosen job position – then it is more likely than not that
these individuals will collectively have a huge impact on the profitability of a
company. However, some serious problems remain, not least:
1.
What
constitutes a “best person”?
2.
How do
we know the person is “best”? Best for what exactly?
3.
How
will a prime “employee” actually be more “productive”?
4. How
will we measure the impact of the hiring strategy for its success or failure?
The presentation was sponsored by OPRA and Psytech International - and was given during the 1st week of March 2004 in Christchurch, Wellington, and Auckland. The netlogo software used to demonstrate the evolved systems dynamic modelling of altrusim and selfishness (which was shown in Auckland only) is available free of charge from: http://ccl.northwestern.edu/netlogo/ The presentation is available here as a zipped powerpoint (XP/2000 format) presentation (16Mb download, yes Megabytes!!).
Validity
and Utility in I/O Psychology. A 64-slide presentation -
with notes on almost every slide. These "notes" comprise commentary, paper
abstracts, and quotes from papers which further elaborate on the slide
statement. From the first slide: "This presentation seeks to offer a sample of
several fairly recent major publications and abstracts from a variety of authors
which seem to indicate that the current paradigm of psychology and psychometrics
is beginning to “dissolve”. It is not for me to exhort the audience to be
convinced or accept my own judgement as veridical on these sets of evidence,
argument, logic, and informed opinion. However, they do seem to ask major
questions of academic psychologists as well as applied psychologists as to
whether some or all of the below is valid or even partially valid. Individuals
like myself have responded to many of these issues with a critical acceptance
and complete change of approach to investigating and applying psychological
knowledge and methods in research and applied practice. This does not mean we
reject what has gone before, but rather we better understand some of its
limitations and the limitations of the methods we have been relying upon to get
us this far. The problem we face is that some of the now apparent limitations
(as with inferential concepts of statistical data models, measurement, and
validity) seem sufficiently substantive as to make us wonder as a profession
whether we are indeed at the beginning of a paradigmatic rather than
evolutionary change. The implications for the future of both academic and
applied psychology are huge. But, let’s be clear, if change is coming – the
history of science teaches us that it will at first be slow, grudging, and
divisive". The presentation simply reports on those papers which have caused me
to consider that the science of psychology, let alone individual differences,
I/O psychology, and methodological areas, is about to undergo a paradigm change.
Ian Deary once accused me of "shouting" to the profession, and how the
profession is simply evolving to do the "next generation work" already. Well, as
I say, I leave the reader to decide upon these matters. That there are
fundamentally different approaches to psychological investigations already
happening is not in doubt - but this is not being done by many, and certainly
not by many "established" or "famous" researchers in the area. However, as I
say, this presentation is not to exhort people to some course of action, but
rather to just expose them to some important papers and individuals whose work
and logic seem to point to the necessity to completely rethink some of what we
do, and how we approach our research. Members of the audience were given many of
the original papers to take home - so that they could read first hand some of
the arguments and peruse the evidence for themselves. The presentation is available
here as an Acrobat
format file (version 7.0), two slides per page in note form. The program
"Correlation Visualizer" used within the presentation will be available
soon.The chapter: Lykken, D.T. (1991) What's Wrong with Psychology
Anyway?. In D.Cicchetti and W.M. Grove. (Eds.). Thinking Clearly about
Psychology. Volume 1: Matters of Public Interest. University of Minnesota Press.
ISBN: 0-8166-19182. is available
here.
The Chinese Challenge to the Big 5: A 20-slide
presentation given at the 2005 British Psychological Society Test User
Conference in May, 2005,
Graham Tyler wrote
an article for Selection & Development Review reviewing the 15FQ+* (Tyler,
2003). The article introduced the 15FQ+ (Psychometrics Limited, 2002) as a
psychometrically-sound personality assessment tool that was beginning to
accumulate cross-cultural evidence of its utility in workplace psychological
assessment. In the interim two years, our research team has been active in the
translation, adaptation and validation of the 15FQ+ in Asia as part of a project
which aimed to assess the utility of Western and Chinese models and measures of
personality in Asia and globally. The following provides data and analysis from
one stage of this program and provides a rationale for the acceptance of the
Five Factor Model (FFM) of personality and related assessment tools in China.
The presentation may be downloaded
here
(159k). A paper published in December 2005 on the same topic can be downloaded
here
(Tyler, G., Newcombe, P., & Barrett, P. (2005). The Chinese challenge to the
Big-5. Selection and Development Review, 21, 6, 10-14.)
Research Methods for the 21st Century.
An 11-slide presentation and a Word document (both in pdf format) which were
part of a symposium organized by Brian Haig (University of Canterbury) at the
New Zealand Psychological Society Annual Conference in Dunedin, 1st-4th
September, 2005. My presentation was concerned with the questions "what actually
might constitute a thoroughly modern and up-to-date research methods training in
psychology?", and "what actually happens in practice?". I proposed a Model
1 and Model 2 psychology training - where Model 1 is near to what happens
currently in New Zealand, and Model 2 as the kind of methods training that would
be "world-class" for the 21st Century. The costs and benefits of each
(financial, time, and academic) were briefly outlined in the presentation
document, along with what I thought were some interesting questions which arise
when one sees these two kinds of curricula. I did not specifically seek to
advocate one vs the other, but rather to attempt to consider the long-term
consequences for students who become Model 1 psychologists vs those who might be
trained nearer to the depth of knowledge within Model 2. Probably the most
important feature of my talk was the issue of training students in "methodology"
first, then methods. In my own presentation it was left as implicit within Year
1- but Kerry Chamberlain from Massey University, and especially Brain Haig's
talk really brought this out as a fundamental theme that in fact all speakers
shared (Neville Blampied from Canterbury included - who by the way gave an
excellent talk on a novel graphical methodology for analyzing single-case
research) in their various talks. If anything, I would venture this was the most
fundamental "message" to be delivered from this symposium - that an
understanding and critical appreciation of "methodology" as an area of study in
and of itself is indeed the precursor to specific methods-skills training.
Brian's message about theory evaluation as a key feature of training is also
worth noting. Of note as well was that the "division between "qualitative and
quantitative" was seen as largely pointless and irrelevant to much of the debate
- a point made strongly by Kerry. However, when you approach investigation
within the subject area of psychology as a kind of "literary-discursive"
activity v as a kind of "cognitive physics", then this seems to be a
defining perspective which dictates the kind of methodology and methods training
that is needed to answer the kinds of questions which might arise from such
diverse views of psychological investigation/understanding. Anyway, it was
interesting on the day! The curricula (models 1 and 2) can be downloaded
here (45kb), with the small
presentation and "interesting questions"
here (191kb).
Person-Target Profiling: Issues in Matching and
Construction.
A 66-slide presentation that summarises my
ongoing work in vector profile construction and congruence matching, along with
the developments in 1 and 2-D graphical profiling. This was given as a keynote
address at the Consumer Personality and Research Conference in Dubrovnik, on
September 21st, 2005. Recommendations are made regarding the potential use of
any matching coefficient and the kinds of tests which are needed to made within
the particular application context. Empirical analyses of data are used to
elaborate how each test is undertaken and the sometimes surprising results
provided by each test. Further, given the perceived sub-optimal performance of
all conventional euclidean distance-based and discrepancy coefficients within
person-target profiling, a new
designer-mediated matching coefficient is introduced (Kernel Smoothed Distance)
and briefly evaluated. The presentation concludes with an examination of
respondent-constructed personality and work-preference profiles using graphical
profiling technology in one and two dimensions, along with a designer-mediated
approach to computing matches in 2-dimensional profiling. The presentation ran
three data analysis/simulation programs and a 30Mb AVI file. These are not
included here. However, screenshots of the some of the program outputs are
provided in a separate pdf file
here (599k). The presentation itself may be
downloaded as a pdf file
here
(2.08Mb).
Quality
and Quantity: a 77-slide presentation on the logic of measurement. The structure of the 3 hour
session is arranged around the following topics:Qualities, Symbols, and Numbers, Quantity and Measurement, Theories of
Measurement, Meaning, Theory, and Measurement, and the Implications of the above
for Organizational Research. Mainly definitional overview - but highly
challenging to both staff and students! The original powerpoint file is in a
zipped archive
here
(106k), and a convenient 4-slides-per-landscape page Acrobat pdf file
here
(229k).
Predictive Accuracy: a 36-slide
presentation which may be downloaded as a pdf file from
here.(788kb)
Paper 1: Breiman, L. (2001)
Statistical Modeling: the two cultures. Statistical Science, 16, 3,
199-231.
Paper 2: Bickel, P.J., Ritov, Y., & Stoker, T.M. (2006)
Tailor-made tests for goodness of fit to semiparametric hypotheses.
Annals of Statistics, 34, 2, 721-741.
Paper 3: Haig, B. (2005)
An
abductive theory of scientific method. Psychological Methods, 10, 4,
371-388.
Paper 4: Barrett, P.T. (2003)
Beyond Psychometrics:
measurement, non-quantitative structure, and applied numerics. Journal of
Managerial Psychology, 18, 5, 421-439.
![]()