Psychometrics and Measurement
Beyond Psychometrics: the recovery of a standard unit of length. (November, 1998)
Beyond Psychometrics: the strategic implications for occupational psychology (March, 1999)
Applicant vs Non-Applicant Data - Part 1- (Rosalie Brown and myself, BPS TUC 99 conference, June 1999)
Applicant vs Non-Applicant Data - Part 2- (Rosalie Hutton (ex-Brown!) and myself, BPS OccPsy conference, Jan. 2000)
The Role of a Concatenation Unit (BPS Maths-Stats-Computing Meeting, London, December 2001)
Single Item/Attribute Psychometrics: can it be done? (NZ I/O society - Auckland Region, Wednesday 20th February, 2002)
 Graphical Profiler Assessment  (NZ Psychology Society Conference, 2002)
Measurement cannot occur in a theoretical vacuum (AERA-D April 2002 Symposium Paper)
Personality Assessment via Graphical Profiler (September, 2003)
Quality and Quantity: The logic of measurement (Postgrad Class Lecture Note - 2006)
Predictive Accuracy as THE criterion for organizational research (8th Annual NZ Work Research Conference keynote - May 2007)
2007 New Zealand Psychological Society Annual Conference - Hamilton (Aug 23rd-26th)
             Good Judgment, Intelligence, and Personality
             Two Big Ideas
             Brunswick Symmetry, Complexity, & Non-Quantitative Psychology - Tying it all Together
SIOP 2009: Measurement Invariance and Latent Variable Theory in Cross-Cultural Psychology (April 2009)
ISSID 2009: Interrater Reliability (IRR): Measuring Agreement and Nothing Else (April 2009)
#35: Taxonomies, traits, dispositions, motivations, and personality dynamics: How now to interpret personality test scores? (April 2010)

Business-Commercial Psychometrics
Pre-Employment Integrity Testing: Current Methods, Problems, and Solutions. British Computer Society Information Security Specialist Group Annual Conference, March, 2001.
The POP questionnaire - single item psychometrics and 16PF FormA vs 16PF5 (BPS Conference, Warwick, 1995)
Evidence-Based HR. can it be done (NZ Psychology Society Conference, 2002)
Single Item/Attribute Psychometrics: can it be done? (NZ I/O society - Auckland Region, Wednesday 20th February, 2002)
SIOPSA keynote address "Psychological Assessment and Data Utility: It's Time to Innovate" (June 2003)
Maximising Business Performance: Using Psychometrics to Improve Efficiency, Productivity, and Performance  (March, 2004)
 Validity and Utility in I/O Psychology  (May 2005)
The Chinese Challenge to the Big 5 (Tyler et al, 2005)

Individual Differences
Chronometric and Bioelectric Correlates of IQ (December, 1999)
The String Measure. Evoked Potential Correlate Research, and Psychometric IQ (BPPS, December 1999)
Quantitative Science and Individual Differences: candidate models for intelligence and personality. The ECP10 paper as above, but heavily augmented with about 15 new slides that support my arguments concerning the poor scientific value of low correlations (September 2000)

Clinical Effectiveness and Outcome Evaluation
 Clinical Effectiveness: the Rules for Treatment Instantiation and Outcome Evaluation (Maureen Nicol and myself, April 2000: The State Hospital)
Outcome Measuring Procedures in Secure Settings. Shaftsbury Clinic, Springfield University Hospital, April 6th, 2001

Forensic Psychology Issues
Decision Table Analysis: Definitions, Methods, and Assessing Risk (November 1999)
Risk Prediction and Risk Management: obviously not a priority for senior managers in psychiatry and psychology. Strategy, strategy, strategy! (Psychological Solutions to Personality Disorder Conference: 14/3/2000)
Factor Score Disparity in the Psychopathy Checklist Revised (September, 2003)

Methodology
Hypothesis Testing and Power Analysis (September 2000)
Interrater Reliability: Definitions, Formulae, and Worked Examples (March, 2001)
Research Methods for the 21st Century (September, 2005)

Computational Profiling
NZ Psychological Society Conference 2003 - paper and workshop abstracts and downloads  (30th August, 2003)
Person-Target Profiling Workshop: Issues in Matching and Construction (September, 2005)

Beyond Psychometrics: the recovery of a standard unit of length : This 50-slide presentation was given at the British Psychological Society's Division of Occupational Psychology conference: Assessment in the Millennium: Beyond Psychometrics, November 1998,  at Birkbeck (University of London). The theme of this presentation was about Rasch scaling, and its capacity to construct a standard unit from observational data. This presentation contained a data simulation that attempted to hide a true quantitatively structured latent variable of length behind some poor ordinal observations. All the Rasch scaling did was to construct an equal-interval latent variable of ordinal lengths! This simulation was heavily criticised Ben Wright and others, and I have included these criticisms as an addendum to the presentation - along with my reply. However, recent papers seem to have vindicated my conclusions in some respects. I'm now undertaking a massive simulation to really hammer home the inability of Rasch or IRT scaling to recover "true" latent variables. The reality is that these methods simply construct linear latent variables in complete isolation of any empirical evidence that such variables might indeed be quantitatively structured.. In my opinion, from a scientific perspective, these scaling methods are frankly of little utility, but they are ingenious from a psychometric perspective and do have great utility in a more pragmatic sense. It all comes down to what the purpose is for using such scaling, science or number scaling.  The presentation itself may be downloaded here , as a pdf file (625k in size).

Beyond Psychometrics: the strategic implications for occupational psychology: This 44 slide presentation was given at the British Psychological Society's Division of Occupational Psychology conference: Assessment in the Millennium: Beyond Psychometrics. This was a second conference on March 5th, at Birkbeck (University of London) again, that repeated the theme from the first one in November 1998. There are no notes with these slides. Note the new reference to Salgado (1999) ... a review or personnel selection methods that augments the work of Schmidt and Hunter (1998) - also referenced in the presentation. This presentation sets out what may lie beyond the current use of psychometric measurement and assessment in occupational psychology, in the millennium. It focuses on both changes in practice and techniques, introduces a new First Law for future practice, and outlines in some detail, a new kind of "smart" profiling for candidate choice (whether for promotion, training, selection, or team-building). This presentation sets out what may lie beyond the current use of psychometric measurement and assessment in occupational psychology, in the millennium. It focuses on both changes in practice and techniques, introduces a new First Law for future practice, and outlines in some detail, a new kind of "smart" profiling for candidate choice (whether for promotion, training, selection, or team-building). ALSO: the PC Windows 95/98 format program that runs examples of Wolfram's 1-dimensional cellular automata  is downloadable here. This was the example program I ran in the presentation. I have augmented it slightly to add in more rules on an "autoplay" button (alongside the example rule-sets and their special one-off buttons). All you need to do is download the zipped installation fileset CA.ZIP (click here ). Use Winzip to unzip or run SETUP.EXE directly from the Winzip archive. Alternatively, unzip the files into a temporary directory and run the file SETUP.EXE in this temporary directory. The installation program is completely automated and creates its own program listing entry and icon. An Acrobat 4.0 pdf file of the presentation, containing both slides and notes (as in Powerpoint) is also available. This file is 241k in size. Click here for the pdf file version.

Applicant vs Non-Applicant data - Part 1 This 32 slide presentation was given at the British Psychological Society's 1999 Test User conference by Rosalie Brown. This was essentially a major aspect of her recent her MSc thesis work. I simply helped here and there with a few bits of statistical advice and one or two analyses. The Acrobat 4.0 pdf version is available here (725k).

Applicant vs Non-Applicant data - Part 2 This 26 slide presentation was given at the British Psychological Society's January 2000 Occupational Psychology conference by myself and Rosalie Hutton (surname was Brown) Following on from her presentation in June 1999 (See Part 1 above), we extended the research to two new tests, the Psytech International 15FQ and Saville and Holdsworth's Concept 5.2 OPQ. As the analyses evolved in the second presentation, it became clear that the issue was confounded by two key problems, unproven measurement axioms, and a predilection for subjectivity in personality questionnaire scale score interpretation. A further problem was thrown up when we attempted to analyse the OPQ - that of questionanires that have no a priori psychometric structure. The overall conclusion was that, apart from using some very, very, basic psychometric principles, the kinds of analyses adopted by us (and others) are simply too powerful given the properties of the data at hand. Further, we posed the question as to whether any equal-interval test  theory was of any practical or theoretical relevance any more. The Acrobat 4.0 pdf version is available here (403k)

  Decision Table Analysis: Definitions, Methods, and Assessing Risk . An 86-slide Powerpoint file - an exposition of 2x2 table analysis for decision making and ROC analysis that uses the VRAG dataset to show how researchers use these methods in practice. The file can be downloaded here as a pdf file (467k)  Decision.pdf Also for download is: Roc1.jpg .... a special ROC distribution graphic - can be opened with your browser, MS Word, or Paintshop Pro etc. It accompanies the presentation.

Chronometric and Bioelectric Correlates of Psychometric IQ :  these 7 slides (with notes) provide the graphical antidote to those who become somewhat over-excitable about the measurement/predictive potential of chronometric and bioelectric indices based upon their correlations with psychometric IQ. Here, you see exactly what the data look like that underlie the kinds of -0.5 correlations you sometimes see presented between these measures and IQ scores. No simple theory of speed or variability can explain these data - which have been replicated in several experiments. The publications from the biosignal lab and elsewhere are presented below in the evoked potential correlate research presentation "Key References" list. The challenge is to figure out how to make robust, direct, and routine measurement of fundamental nervous system properties, and basic reaction times. Some good work has taken place with auditory IT - but precious little in other domains. This is still a great research area to work in - but the strong theory linking these measures to a decent theory of intelligence is still sadly lacking - and the biosignal lab now sits in the Science Museum in the UK! The presentation can be downloaded as a pdf file (Acrobat 7) here (177kb)

The String Measure. Evoked Potential Correlate Research, and Psychometric IQ . This 21 slide presentation was given at the 1999 British Psychophysiological Society's Conference, at the Institute of Neurology (13th-15th Dec.), as part of an excellent symposium on Intelligence and Personality organised and introduced by Peter Caryl. The presentation is basically a brief exposition of the rationale, evidence, and my conclusions about this area. Ian Deary and myself disagree on one fundamental point - the status of theory in this area of research. Ian made the point about atheoretical genome sequencing a al Craig Venter's approach. I made the point about Einstein and Theoretical Physics. Yep, it was that kind of symposium - excellent stuff - and all credit to Peter and the other speakers (Martha Whiteman, Ian Deary, Andrew MacLullich, and Peter Caryl) for some really thought-provoking presentations. The Acrobat 4 pdf file can be downloaded here (283k). The zipped Powerpoint presentation can be downloaded here (314k). There is also an accompanying Key References document that provides the references to all papers/results that I mentioned in my presentation. The Acrobat 4 pdf file of these references can be downloaded here .

Risk Prediction and Risk Management: obviously not a priority for senior managers in psychiatry and psychology. Strategy, strategy, strategy! This 24 slide presentation was given at the UK High Security Psychiatric Hospitals Conference on Personality Disorder - Leeds, March 14th-15th. Here I examine the failure of every high security mental health institution in the UK to implement a coherent, meaningful, organisational-wide strategy for risk since the VRAG and RAMAS systems were published in 1994/5. I provide an explicit strategy that should have been (and still might be) implemented immediately - show how leadership and Goal Directed Management would have achieved results, and finally dissect the reasons that I see as causing the failure by senior clinical management to evolve and implement such a strategy. I conclude by discussing new "hot" areas and the key experts to be consulted by organisations in this area. The Acrobat 4.0 pdf version is available here (321k).

Clinical Effectiveness: The Rules for Treatment Instantiation and Outcome Evaluation. This 102 (yes 102!!) slide presentation was given at the State Hospital to the Psychology Department during late April, 2000. It takes the 8-step model from the Clinical Effectiveness presentation Effectiveness #2 set: Expertise, Therapy, Audit and Evaluation Methods from the clinical effectiveness series (contact Paul Barrett  if you want this), and tests it against two current psychological therapies within the hospital. From this evaluation, it then examines two medical treatments for pneumonia and epilepsy, and evaluates how these meet the model specifications. Then it contrasts the psychological therapy "fit" to the model vs the medical model fits. Finally, it tries to explain why the psychology models do not fit the 8-step model, then defines the suggested rules for any mental health practitioner who might wish to instantiate treatment and evaluate it, as part of best-practice development. However, it concludes that many of the aspirations, by management and some senior clinicians, toward assessing clinical effectiveness within mental health are commendable but unlikely to ever be achieved. Instead, certain concrete suggestions are made that are likely to increase information about effectiveness whilst retaining a degree of reality with regard to the logistics of their implementation.  The Acrobat 4.0 pdf version is available here (242k).

Quantitative Science and Individual Differences: candidate models for intelligence and personality A heavily augmented version of the Krakow presentation - given to the Dept, pf Psychology, University of Canterbury. NZ. It consists of 48 slides, and is downloadable in pdf (2.1Mb) format. A summary of the three major test theories along with some analysis work. The references document (Word 2000 format) that supports this presentation is keyrefs2.pdf .

Hypothesis Testing and Power Analysis . A 48-slide Powerpoint file - a summary review of the fundamentals of hypothesis testing and power analysis - with graphics! Download the pdf version (380k) Hyptest.pdf.

Interrater Reliability: Definitions, Formulae, and Worked Examples . A Word 2000 document that goes into both conceptual and computational detail for interrater reliability analysis. This is a revised version (22nd March, 2001) that incorporates detailed SPSS analysis examples (as well as STATISTICA examples) for Intraclass correlations as per Shrout and Fleiss Models 1, 2, and 3. Download a pdf version (248k)  Rater.pdf -or- a zipped Word Document (315k)  Rater.zip ...

Pre-Employment Integrity Testing: Current Methods, Problems, and Solutions . British Computer Society Information Security Specialist Group Annual Conference, March, 2001. This presentation consists of two parts - the first is a written paper which is a fairly detailed description of the methodologies used to assess an individual's integrity and honesty, and the detection of faking. These range from the polygraph, P300 AEPs, Hypoactive arousal, interviews, biodata, covert and overt psychometrics. This paper includes a list of all major advertised psychometric covert and overt integrity tests (and derived scales), along with publisher web, email, and address contact details (updated slightly 22-Feb-2005). The second part is the powerpoint presentation. This consists of 38 slides, which discuss in some detail, the problems with this area (some of which are shared by all psychometric tests), and a one-page checklist of "solutions" based upon the results of the analyses and discussions presented earlier in the presentation (and following the relevant APA guidelines on this issue). Again, I totally reject the use of validity and correlation coefficients alone as useful indicators for purchasing decisions. This rejection is based upon detailed worked quantitative examples in the slides that show that a cost of $750,000 is "hidden" in one of two validity/correlation coefficients of 0.33. Further, I show that Ones et al's statements as to "Social Desirability" being a "red herring" are perhaps more generally limited to the use of the relatively trivial "social desirability" scales. Their comments do not apply to purposeful, adaptive faking of critical psychological attribute tests such as integrity tests. The overall thesis of these two "offerings" was to enable a potential purchaser/investigator  of such methodologies (and future test developers) to see just what is absolutely required for the development and use of these methodologies (in terms of any third party assessing their validity and utility for each specific application). The theme that I was stressing throughout is that in this particular domain, cost-benefit analysis of any potential decision-making strategy is the critical feature of the purchasing decision. This is why all the corrected validity coefficients in the world are quite useless except as initial indicators of "potential". A concrete example was drawn from the forensic area of how to implement integrity testing using both ordered-categorical "dose-response" analysis, ROC analysis, and dichotomous outcome analysis - using the VRAG (Violence Risk Assessment Guide) of Webster, Rice, and Harris, Cormier, and Quinsey). Finally, I concluded that it was my opinion that this was to be the next major "growth area" in psychometric testing. [22-Feb-2005 ... Remember, this presentation is now 4 years old. Most of it remains relevant - and I've updated a couple of test publisher address and contact details  - but it does not include reference to "brain fingerprinting" via fMRI or the latest work on facial muscle perception/detection. Further, my opinion about this being the next major growth area in the US/UK markets was wrong. It has in fact, like a lot of psychometrics these days, just stalled. This is to do with the changing market conditions in which psychometric tests are now being sold;  especially concerning issues with "talent" shortages, human capital interventions. and the role of psychometrics as "development" rather then simple screening/assessment instruments. Further, unlike the lower levels of technical capability required to use tests of personality and ability within I/O psychology domains, integrity tests require a high level of technical competence in cut-score/ROI maximisation/risk-assessment and decision-support statistics]. The Downloads ...pdf format files Integrity_doc.pdf ... 51k Integrity_tables.pdf ... 139k ( 22-Feb-2005) Integrity_ppt.pdf ... 1.9Mb

Outcome Measuring Procedures in Secure Settings . Shaftsbury Clinic, Springfield University Hospital, April 6th, 2001. This is a 24-slide presentation that details the current impossibility of implementing organisational-wide treatment-outcome evaluations as part of a coherent clinical effectiveness strategy within prisons and secure mental health settings. A desired model for setting up outcome measuring procedures is first outlined, along with the Barrett 8-step model for general treatment-outcome specifications. Then, the very real difficulties and the implicit assumptions sometimes made by those from "management" and "effectiveness departments" who speak on these issues are exposed. Finally, the reasons for failure of (any?) major UK "secure" organisation to tackle systematic treatment-outcome measuring procedures at a system-wide level is discussed. It is concluded that the fault with the lack of implementation lies not with the clinicians and practitioners who work in these institutions, but with senior (board-level) management and the relevant government ministries. What is required is strategic thinking, planning, adequate resource input, and leadership. The presentation can be downloaded as either a zipped powerpoint file ( outcome.zip ) 343k, or as an Acrobat 4.0 file ( outcome.pdf ) - 364k. For those unable to use WinZip or Acrobat in their institutions (due to the new NHS IT Guidelines being implemented in certain institutions), I've included the MS Powerpoint 2000 file (Office 97 compliant - click here) for download (404k).

The POP questionnaire - single item psychometrics and 16PF FormA vs 16PF5 . This 60-slide dual presentation was given by myself and Laurence Paltiel at the British Psychological Society (BPS) Occupational Psychology conference in January 1995. It should be read with several health warnings - not least by new staff at Saville and Holdsworth (SHL) and ASE-NFER! Behind all the noise and drama that took place during and after these presentations - (and those in the audience will know just how much there was!!), there are several serious issues that are worthy of consideration. Although I would modify several slides now to better reflect my own growing experience and reflection after 6 years, the bulk of the presentations still stand as valid (for me at least). Not least of which is the notion first explored here and fully reported in a subsequent paper, that of direct construct measurement using a single item to reflect a narrow construct. There was a "lively" debate on this issue with Helen Baron, Peter Saville, and George Sik of SHL in the British Journal of Occupational and Organisational Psychology, and in the BPS I/O practitioner journal Selection and Development Review. As to the 16PF section - well, again, there are some very interesting observations made here that do cause one to pause moment and consider what is meant by "evolution" of a test - and how far one might evolve a test beyond which it loses continuity with an older version. In a sense, the papers by myself and Rosalie Brown (nee Hutton) Applicant vs Non-Applicant Data address this issue from a tangential perspective - but I do find these issues interesting from a "what is really going on here" perspective. Since I now publish with ASE as a test author, am on friendly terms with IPAT, and have shared a chat and joke or two since with SHL(!) directors in the recent years ... I hope they appreciate that my intention with such a presentation was never to insult or denigrate - but to open up as far as possible, all the material necessary to enable independent-minded individuals to critically evaluate my propositions and conclusions. This was a view (and still is) shared by Laurence Paltiel, the co-author. The reason for posting this presentation is because it was mentioned in a presentation on the Mariner7 Talent Engine that I gave recently. The presentation is available as a .pdf file only (about 750k) - and is downloadable here . The published papers relevant to the OPQ/16PFpresentations are:

Barrett, P.T. and Paltiel, L. (1995). Reductio ad Absurdum? A reply to Saville and Sik (1995). Selection and Development Review, 11, 6, 3-5

Barrett, P.T., Kline, P., Paltiel, L., and Eysenck, H.J. (1996). An evaluation of the psychometric properties of the Concept 5.2 OPQ Questionnaire. Journal of Occupational and Organisational Psychology (JOOP), 69, 1, 1-19

Barrett, P.T. and Paltiel, L. (1996). Can a single item replace an entire scale? POP vs the OPQ 5.2. Selection and Development Review, 12, 6, 1-4

Barrett, P.T. and Hutton, R. (2000) Personality and Psychometrics. Selection and Development Review, 16, 2, 5-9


The Role of a Concatenation Unit . The 41-slide presentation comprised the guest lecture at the 2001 British Psychological Society's Mathematics, Statistics, and Computing Section meeting in London. It is concerned with the role such a unit plays in the kinds of statements one might wish to make about the causal relations between phenomena and variables. To this end, it explores the investigative process within science, the nature of "phenomena detection" (Brian Haig), and the approaches that can be adopted for unit creation. It then questions whether adopting an axiomatic approach to variable/scale construction is of any practical value to both applied and fundamental scientific psychology. Views are noted from Roderick McDonald, Peter Schonemann, Joel Michell, Michael Maraun, and William Fisher Jnr. It is concluded that what is crucial for advances within individual differences psychology [at least] is normatively specified technical constructs that form the basis for single empirically established metric scales, constructed using the axioms of additive conjoint measurement. An example of where this has been successfully achieved is the Lexile Scale for reading proficiency. There is no guarantee that any constructs within Individual Differences can be scaled into a single metric - however, the hypothesis that a variable might possess quantitative structure is testable.  The zipped powerpoint presentation is available here (387kb). The Acrobat version (333kb) is available here . The "take home" message is clear "if we are dealing with variables that possess an additive-unit quantitative structure, yet do not take advantage of this, then no causal explanation for phenomena can ever attain a level of explanation beyond ordinal-level relations". It is now the aim of the author to attempt to systematically create a metric scale for 'g', in conjunction with the setting forth of a technical definition of the constituent properties of 'g' (which means no more than "tidying up" Spearman's and Jensen's statements into a clear, normative, technical set of meaning statements).

Evidence-Based HR. Can it be done? This was the last 45-slide presentation of a 2002 workshop I presented on business psychometrics and HR. It doesn't come much more brutal than this. However, the aim of this "brutality" is to try and expose the bare essentials of my position - and also expose what may be my own loose thinking to others. This is not a presentation that in any way tries to provide simple answers to what is essentially (for me) a very complex problem. Yes, I don't think HR "gurus" have actually done much at all for this area except line their own pockets with money, and leave their profession with a continuing headache. Neither do I feel that psychologists (including Schmidt, Hunter, and Cascio et al) who have espoused utility theory, meta analysis, or simple cost-benefit models as a form of "take-it-from-me-evidence-base" have done much to help the profession solve this problem either. So - here, I try to provide an initial framework for at least a clear and solid target for debate. I am starting work on the computational modelling issue directly - as my contribution to finding what I think is the only possible solution to the complexity of the problem. I don't know if this is going to work, or even if it is possible - but I am at least having to deal directly with reality rather than some idealized simplistic nonsense such as "average standard deviation performance in dollar value". The argument here is not so much that utility analysis is a bad thing - but rather that it is irrelevant to an evidence-based HR - as it does not constitute evidence in and of itself. Ok - you get the picture! Download the pdf version (847k) or a zipped powerpoint (445k) version

 Graphical Profiler Assessment? A 39-slide presentation on the POP questionnaire and the 2-dimensional Mariner7 Preference profiler, with some retest information. Download the pdf slides (2-per page) pdf version (3.4Mb)

Single Item/Attribute Psychometrics: can it be done? This 20-slide presentation with notes pages was presented to the NZ I/O society Auckland Region group, on Wednesday 20th February, 2002. Mariner7's new product "Talent Engine" was featured, with especial regard to how it works, and the many issues concerned with psychometric test theory and measurement within such a system, not least of which are the key issues of reliability and validity. Talent Engine is the first in a line of products to be built around Mariner7's on-line assessment technology for acquiring measurements of psychological attributes. The next big step for Mariner7 is developing online (or computer-based) tests for personality and values, based around its profiler assessment technique. A further development is also being contemplated within the clinical and medical domains of interest, where patient response to questionnaire measures is constrained by particular cognitive deficits. The presentation was really concerned with the many implications for psychological assessment in using the preference profiler technology. There are some handy references (maybe!) that support some of my reasoning concerning the use of work-preferences as a primary selection information source, along with the role of social desirability, personality and job performance, and meta analysis in general. Hopefully, the presentation will succeed in stimulating the thinking of the reader - academic and practitioner alike - as to what is the limit of this technology, and whether it confuses or enhances psychological measurement. The presentation is available as a pdf file (608k) download, or as a zipped powerpoint (XP/2000 format) presentation  (761k download), so that you can get at the original slides and text notes pages if you wish.

Measurement Cannot Occur in a Theoretical Vacuum This 32-slide presentation is being presented on my behalf (by Trevor Bond) to the AERA-D Rasch Measurement SIG. New Orleans, USA, 3rd  April, 2002 at this years Educational Measurement conference. I was meant to have been attending personally, but funding was unable to be obtained by myself for the trip from NZ. The symposium is a humdinger though - I really wanted to make this one! Anyway, my presentation is available as a pdf file (404k) download, or as a zipped powerpoint (XP/2000 format) presentation  (266k download). It also contains the Rozeboom paradox - which concerns the failure of additive concatenation of a standard unit - which serves notice on the role of meaning in measurement!
Symposium Title:
Educational Measurement: Is it really possible?
Symposium Abstract
:
 This symposium addresses a key conceptual flaw in quantitative educational research by asserting that educational researchers generally do not comprehend the requirements of scientific measurement, or, where they do, think that such measurement is not actually possible in the human sciences.  While the historical precedents of our misunderstanding might be better understood in hindsight, this symposium suggests that that the principles of measurement are not only relatively straightforward but in fact essential if edumetrics is to develop beyond its current methods of actuarial and statistical score-mark classifications. The benefits of establishing fundamental measurement procedures for educational variables extend far beyond the idiosyncrasies of the current statistical approach to educational outcome indicators. While Rasch analysts in particular, often claim to address the issue of genuine scientific measurement, the inadequacies of current goodness of fit techniques and the importance of substantive theory are seen as key issues for resolution. The symposium presenters aim to participate in genuine dialogue with the audience and to canvass these issues that are considered critical to our discipline.
Chair and Discussant: Trevor Bond ( Trevor.bond@jcu.edu.au) with Joel Michell, William Fisher, George Karabatsos, and Paul Barrett as paper presenters.

Factor Score Disparity in the Psychopathy Checklist Revised . (Carlin, Barrett, and Gudjonsson). A paper given at the NZ Annual Psychology Conference at Massey University, Palmerston North - (August 30th - September 3rd). The Hare Psychopathy Checklist-Revised (PCL-R) is a 20-item interview/records-based evaluation tool used by many clinical and corrections practitioners to assess propensity for Psychopathic attributes within an individual. The PCL-R is now the primary instrument for such clinical-forensic evaluations. Three scores may be generated using this instrument: a total score, a Factor-1 (F1) score which assesses Interpersonal characteristics, and a Factor-2 (F2) score, which assesses Social Deviance. Recent legal cases in the UK have highlighted an interesting phenomenon, where a defence and prosecution expert-witness psychologist have assessed the same individual quite differently, and where the score disparity between Factors 1 and 2 appears extremely large. The primary purpose of this study is to compute the probability of observing a score between 0 and 16 on F1 conditional upon each score level (0-20) of F2 (Social Deviance: 2003 version scoring), and vice versa. Further, the conditional distributions of Factor score disparity relative to the Total PCL-R score will also be calculated. The method for achieving this will depend on how the data are distributed for each factor. Following the PCL-R test-manual conventions, using a Pearson correlation between the two score sets, and assuming normally distributed scores for Factors 1 and 2, it is possible to generate sufficient artificial bivariate data to create a population-size distribution of expected frequencies for a given relationship between Factors 1 and 2. If, however, empirical evidence indicates that the two score distributions are not normally distributed it will be necessary to use resampling analysis using the empirical joint bivariate distributions. The datasets are from 1358 offenders at three UK prisons, and 217 patients from within two of the four UK high security forensic psychiatric hospitals and a high-security clinic.The presentation can be downloaded here in pdf format. It consists of 30 slides, and is 414kb in size.

Personality Assessment via Graphical Profiler (Barrett and Ebbeling)
One hundred university students were administered a 106-item questionnaire that assessed 10 of the 45 facets of Goldberg’s AB5C Five Factor Model Personality Questionnaire. Each student also completed a new prototype of a computer-administered personality assessment that utilised a one-dimensional graphical profiler methodology pioneered by the first author. The 10 facets and 106 questionnaire items were reduced to just 10 single rating statements, with responses made using positioning by a computer mouse of each facet name onto a non-quantitative rating scale bounded by two phrases “Most Like Me” and “Least Like Me”. Each participant was also asked which method of assessment they preferred, and which one seemed to allow them to best represent their personality via self-report. Scores acquired from both methods of assessment were compared to one another for direct equivalence, along with analyses that examined the participants’ use of the non-quantitative rating scale. Results indicated non-equivalence of assessment method scores, but the majority of participants rated the profiler as the optimal method by which they felt they could describe their personality. An unusual research question for personality psychometrics has now been raised …“which is the most accurate method of assessment of an individual’s personality characteristics?”. The presentation can be downloaded here in pdf format. It consists of 33 slides, and is 374kb in size.

The South African SIOPSA keynote address that supports the Personality Assessment paper and provides references and more detail concerning the measurement issues, as well as introducing more assessment innovations currently "in hand". It was entitled: "Psychological Assessment and Data utility: It's time to innovate". The presentation can be downloaded here in pdf format. It consists of 69 slides, and is 842kb in size. The presentation used some AVI, ancillary powerpoint, and computer programs - which are available on the NZ workshop CD.  Due to the size of the files (almost 200Mb), it is not feasible to distribute them on the web-page. If you would like a workshop CD, please email me .

Abstract
With Joel Michell’s (1997) publication of the paper in the British Journal of Psychology entitled “Quantitative Science and the Definition of Measurement in Psychology”, the methods and procedures of classical and modern psychometrics were shown to be somewhat at odds with the axioms and logic of quantitative scientific measurement. Kline (1998) published the first serious response to this challenge that faced “quantitative psychologists”, concluding that Michell was indeed correct in his arguments. Barrett (2003),has since presented the key arguments, measurement axioms, and what appear to be the obvious logical outcomes of both Michell’s and Kline’s arguments. However, I do not wish to dwell unduly on these matters of the philosophy of science and the theory of quantitative measurement. Instead, I would like to show what can happen to psychological assessment and the use of “test scores” when the strict adherence to test construction using psychometric test theory is replaced by a problem-focussed approach that is guided both by more straightforward measurement concerns and a greater regard for the meaning of what it is that an investigator might be trying to assess.

 I wish to present four applications which form the basis of my current research program, as exemplars of the kinds of new approaches to applied psychological assessment now being developed. These are: The Mariner7 Graphical ProfilerTM, the Psytech International Programmer Potential TestTM, the Smart Profiling logic and algorithms for employee profiling and selection, and Psytech’s Intelligent Psychometrics module embedded within its GeneSysTM system. The first two of these applications embody an entirely new approach to assessment of individual attributes and already exist as commercial products; the first uses computer-based single item attribute measurement in one and two-dimensions in the domains of Work Preferences and Personality. The second assesses the potential of low-technical/educational literacy applicants for “trainability” as computer programmers and systems analysts by measuring the applicant’s on-line behavioural interactions with a computer-based “tool-world” that allows them to construct and run programs to solve several kinds of problems. The third, which is still being developed for commercial application, is an attempt to move beyond current self-imposed statistical and logical constraints on “the profiling” of applicants against targets, in favour of providing solutions and algorithms that use all information considered relevant to the construction of a “target profile”. The notion of a conventional homogenous group target profile is also redefined. The fourth of these developments is the GeneSysTM Intelligent PsychometricsTM module; an expert-system module that allows users to examine the psychometric adequacy of their own questionnaire data, as well examine compliance with adverse impact legislation for any selection decisions based largely upon these data. The  user only need ask questions such as “is my test reliable?” or “is my test biased?”, and the system responds by computing all necessary procedures required to answer such questions, with full narrative explanation of exceptions and analysis findings.

At first glance, these four applications seem somewhat disparate and hardly the constituents of a systematic program of research and development. However, I would propose that they are indicative of just such a program, albeit more broad than most. Two strands compose this program: the first is an attempt to produce more accurate assessments of individual attributes by adopting a direct and somewhat different approach to self-report measures of stylistic attributes (personality, values, interests, and preferences), as well as measuring abilities and “potential” via direct behavioural observation instead of questionnaire items. The second is concerned with making much more productive and intelligent use of person-related data, whether from conventional psychometric tests, behavioural observation, work-history, or biodata. In some respects, one only has to look at what is happening to integrity measurement (moving from simple questionnaires to online non-invasive facial muscle feature detection) to see that my own research program is just another example of how the entire field of psychological assessment is beginning to change. This is also becoming apparent in high-stakes clinical assessment for recidivism-risk, psychopathy, and sex-offenders.

Perhaps the quote from Robert Sternberg and Wendy Williams (1998) can be said to sum up the current position to which my own research program is responding …” No technology of which we are aware- computers, telecommunications, televisions, and so on- has shown the kind of ideational stagnation that has characterized the testing industry. Why? Because in other industries, those who do not innovate do not survive. In the testing industry, the opposite appears to be the case. Like Rocky I, Rocky II, Rocky III, and so on, the testing industry provides minor cosmetic successive variants of the same product where only the numbers after the names substantially change. These variants survive because psychologists buy the tests and then loyally defend them (see preceding nine commentaries, this issue). The existing tests and use of tests have value, but they are not the best they can be. When a commentator says that it will never be possible to improve much on the current admissions policies of Yale and its direct competitors (Darlington, 1998, p. 572, this issue), that is analogous to what some said about the Model T automobile and the UNIVAC computer ".

That there is some utility in the use of current conventional psychometric questionnaires is not in doubt. What is in doubt is whether an order-of-magnitude improvement in accuracy and utility is to be made by using more sophisticated psychometric test-theory models to analyse item-response data, or whether such improvement will be more likely attained by rethinking and constructing entirely new methods of psychological assessment and data analysis. My research and thinking favours the latter.

  Maximising Business Performance: Using Psychometrics to Improve Efficiency, Productivity, and Performance . A 112-slide presentation - with notes for many of the slides containing argument, references, and web-links where relevant. The purpose of this presentation is to see where psychometrics might have some really substantive financial utility for a business. Rather than focus on the small-scale implementation issues – everything I’m trying to discuss here is set around making a big impact on a company’s workforce composition, with a corresponding big impact on the financial bottom-line. If you attract and hire the best people for a chosen job position – then it is more likely than not that these individuals will collectively have a huge impact on the profitability of a company. However, some serious problems remain, not least:

1. What constitutes a “best person”?
2. How do we know the person is “best”? Best for what exactly?
3. How will a prime “employee” actually be more “productive”?
4. How will we measure the impact of the hiring strategy for its success or failure?

The presentation was sponsored by OPRA and Psytech International - and was given during the 1st week of March 2004 in Christchurch, Wellington, and Auckland. The netlogo software used to demonstrate the evolved systems dynamic modelling of altrusim and selfishness (which was shown in Auckland only) is available free of charge from: http://ccl.northwestern.edu/netlogo/ The presentation is available here as a zipped powerpoint (XP/2000 format) presentation  ( 16Mb download, yes Megabytes!!).

Validity and Utility in I/O Psychology .  A 64-slide presentation - with notes on almost every slide. These "notes" comprise commentary, paper abstracts, and quotes from papers which further elaborate on the slide statement. From the first slide: "This presentation seeks to offer a sample of several fairly recent major publications and abstracts from a variety of authors which seem to indicate that the current paradigm of psychology and psychometrics is beginning to “dissolve”. It is not for me to exhort the audience to be convinced or accept my own judgement as veridical on these sets of evidence, argument, logic, and informed opinion. However, they do seem to ask major questions of academic psychologists as well as applied psychologists as to whether some or all of the below is valid or even partially valid. Individuals like myself have responded to many of these issues with a critical acceptance and complete change of approach to investigating and applying psychological knowledge and methods in research and applied practice. This does not mean we reject what has gone before, but rather we better understand some of its limitations and the limitations of the methods we have been relying upon to get us this far. The problem we face is that some of the now apparent limitations (as with inferential concepts of statistical data models, measurement, and validity) seem sufficiently substantive as to make us wonder as a profession whether we are indeed at the beginning of a paradigmatic rather than evolutionary change. The implications for the future of both academic and applied psychology are huge. But, let’s be clear, if change is coming – the history of science teaches us that it will at first be slow, grudging, and divisive". The presentation simply reports on those papers which have caused me to consider that the science of psychology, let alone individual differences, I/O psychology, and methodological areas, is about to undergo a paradigm change. Ian Deary once accused me of "shouting" to the profession, and how the profession is simply evolving to do the "next generation work" already. Well, as I say, I leave the reader to decide upon these matters. That there are fundamentally different approaches to psychological investigations already happening is not in doubt - but this is not being done by many, and certainly not by many "established" or "famous" researchers in the area. However, as I say, this presentation is not to exhort people to some course of action, but rather to just expose them to some important papers and individuals whose work and logic seem to point to the necessity to completely rethink some of what we do, and how we approach our research. Members of the audience were given many of the original papers to take home - so that they could read first hand some of the arguments and peruse the evidence for themselves. The presentation is available here as an Acrobat format file (version 7.0), two slides per page in note form. The program "Correlation Visualizer" used within the presentation will be available soon.The chapter: Lykken, D.T. (1991) What's Wrong with Psychology Anyway?. In D.Cicchetti and W.M. Grove. (Eds.). Thinking Clearly about Psychology. Volume 1: Matters of Public Interest. University of Minnesota Press. ISBN: 0-8166-19182. is available here

The Chinese Challenge to the Big 5 : A 20-slide presentation given at the 2005 British Psychological Society Test User Conference in May, 2005, Graham Tyler wrote an article for Selection & Development Review reviewing the 15FQ+* (Tyler, 2003). The article introduced the 15FQ+ (Psychometrics Limited, 2002) as a psychometrically-sound personality assessment tool that was beginning to accumulate cross-cultural evidence of its utility in workplace psychological assessment. In the interim two years, our research team has been active in the translation, adaptation and validation of the 15FQ+ in Asia as part of a project which aimed to assess the utility of Western and Chinese models and measures of personality in Asia and globally. The following provides data and analysis from one stage of this program and provides a rationale for the acceptance of the Five Factor Model (FFM) of personality and related assessment tools in China.  The presentation may be downloaded here (159k). A paper published in December 2005 on the same topic can be downloaded here (Tyler, G., Newcombe, P., & Barrett, P. (2005). The Chinese challenge to the Big-5. Selection and Development Review, 21, 6, 10-14.)

Research Methods for the 21st Century . An 11-slide presentation and a Word document (both in pdf format) which were part of a symposium organized by Brian Haig (University of Canterbury) at the New Zealand Psychological Society Annual Conference in Dunedin, 1st-4th September, 2005. My presentation was concerned with the questions "what actually might constitute a thoroughly modern and up-to-date research methods training in psychology?", and  "what actually happens in practice?". I proposed a Model 1 and Model 2 psychology training - where Model 1 is near to what happens currently in New Zealand, and Model 2 as the kind of methods training that would be "world-class" for the 21st Century. The costs and benefits of each (financial, time, and academic) were briefly outlined in the presentation document, along with what I thought were some interesting questions which arise when one sees these two kinds of curricula. I did not specifically seek to advocate one vs the other, but rather to attempt to consider the long-term consequences for students who become Model 1 psychologists vs those who might be trained nearer to the depth of knowledge within Model 2. Probably the most important feature of my talk was the issue of training students in "methodology" first, then methods. In my own presentation it was left as implicit within Year 1- but Kerry Chamberlain from Massey University, and especially Brain Haig's talk really brought this out as a fundamental theme that in fact all speakers shared (Neville Blampied from Canterbury included - who by the way gave an excellent talk on a novel graphical methodology for analyzing single-case research) in their various talks. If anything, I would venture this was the most fundamental "message" to be delivered from this symposium - that an understanding and critical appreciation of "methodology" as an area of study in and of itself is indeed the precursor to specific methods-skills training. Brian's message about theory evaluation as a key feature of training is also worth noting. Of note as well was that the "division between "qualitative and quantitative" was seen as largely pointless and irrelevant to much of the debate - a point made strongly by Kerry. However, when you approach investigation within the subject area of psychology as a kind of "literary-discursive" activity  v as a kind of "cognitive physics", then this seems to be a defining perspective which dictates the kind of methodology and methods training that is needed to answer the kinds of questions which might arise from such diverse views of psychological investigation/understanding. Anyway, it was interesting on the day! The curricula (models 1 and 2) can be downloaded here (45kb), with the small presentation and  "interesting questions" here (191kb).

Person-Target Profiling: Issues in Matching and Construction.  A 66-slide presentation that summarises my ongoing work in vector profile construction and congruence matching, along with the developments in 1 and 2-D graphical profiling. This was given as a keynote address at the Consumer Personality and Research Conference in Dubrovnik, on September 21st, 2005. Recommendations are made regarding the potential use of any matching coefficient and the kinds of tests which are needed to made within the particular application context. Empirical analyses of data are used to elaborate how each test is undertaken and the sometimes surprising results provided by each test. Further, given the perceived sub-optimal performance of all conventional euclidean distance-based and discrepancy coefficients within person-target profiling, a new designer-mediated matching coefficient is introduced (Kernel Smoothed Distance) and briefly evaluated. The presentation concludes with an examination of respondent-constructed personality and work-preference profiles using graphical profiling technology in one and two dimensions, along with a designer-mediated approach to computing matches in 2-dimensional profiling. The presentation ran three data analysis/simulation programs and a 30Mb AVI file. These are not included here. However, screenshots of the some of the program outputs are provided in a separate pdf file here (599k). The presentation itself may be downloaded as a pdf file here (2.08Mb).

 Quality and Quantity: a  77-slide presentation on the logic of measurement. The structure of the 3 hour session is arranged around the following topics:Qualities, Symbols, and Numbers, Quantity and Measurement, Theories of Measurement, Meaning, Theory, and Measurement, and the Implications of the above for Organizational Research. Mainly definitional overview - but highly challenging to both staff and students! The original powerpoint file is in a zipped archive here (106k), and a convenient 4-slides-per-landscape page Acrobat pdf file here (229k).

Predictive Accuracy : a 36-slide presentation which may be downloaded as a pdf file from here.(788kb)
Paper 1: Breiman, L. (2001) Statistical Modeling: the two cultures. Statistical Science, 16, 3, 199-231.
Paper 2: Bickel, P.J., Ritov, Y., & Stoker, T.M. (2006) Tailor-made tests for goodness of fit to semiparametric hypotheses. Annals of Statistics, 34, 2, 721-741.
Paper 3: Haig, B. (2005) An abductive theory of scientific method. Psychological Methods, 10, 4, 371-388.
Paper 4: Barrett, P.T. (2003) Beyond Psychometrics: measurement, non-quantitative structure, and applied numerics. Journal of Managerial Psychology, 18, 5, 421-439.