Practitioner Guidelines for Psychometric Testing

Validation Studies

The following Guidelines have been adapted from the Uniform Employee Selection Guidelines, US Equal Employment Opportunity Commission.

Validation is the demonstration of the job relatedness of a selection procedure. The three principal validity strategies are as follows:

  1. Criterion-related validity - a statistical demonstration of a relationship between scores on a selection procedure and job performance of a sample of workers.
  2. Content validity - a demonstration that the content of a selection procedure is representative of important aspects of performance on the job.
  3. Construct validity - a demonstration that:
    (a) a selection procedure measures a construct (something believed to be an underlying human trait or characteristic, such as honesty) and
    (b) the construct is important for successful job performance.

Different methods for validation can be used for different parts of a selection process. For example, where a selection process includes both a physical performance test and an interview, the physical test might be supported on the basis of content validity, and the interview on the basis of a criterion-related study.

In some situations only one kind of validation study is likely to be appropriate. More than one strategy may be possible in other circumstances, in which case administrative considerations such as time and expense may be decisive. A combination of approaches may be feasible and desirable.

Key conditions for a criterion-related study

This is the traditional idea of validation – people who score highly on the test are also found to be more productive. In order to undertake this type of study a substantial number of individuals for inclusion in the study need to be available. In addition, reliable and valid measures of job performance should be available, or are capable of being developed. Further, there should be a considerable range of performance on the selection and criterion measures. The results of such a study would show that scores on the selection procedure are related to scores on the criterion measures of job performance*.

Key conditions for a content-related study

Content validity is appropriate where it is technically and administratively feasible to develop work samples or measures of operationally defined skills, knowledge or abilities which are a necessary prerequisite to observable work behaviors. There needs to be evidence demonstrating that the content of the selection procedure is a representative sample of the content of the job.

Where a test is intended to replicate a work behavior, content validity is established by a demonstration of the similarities between the test and the job with respect to behaviors, products, and the surrounding environmental conditions. Paper-and-pencil tests which are intended to replicate a work behavior are most likely to be appropriate where work behaviors are performed in paper and pencil form (e.g. editing and bookkeeping). Paper-and-pencil test of effectiveness in interpersonal relations (e.g. sales or supervision), or of ability to function properly under danger (e.g. firefighters) generally are not close enough approximations of work behaviors to show content validity.

Key conditions for a construct-related study

In order to demonstrate construct validity there must be evidence showing that the measure of the construct is related to work behaviors which involve the construct**. The Job Requirement Exercise based upon SHL’s OPQ and MQ in the accompanying Appendix provides another example of this.

When is there a requirement to collect validity evidence?

Although validation of selection procedures is desirable, Ciba requires users to produce evidence of validity only when the assessment procedure adversely affects the opportunities of a race, sex, or ethnic group for hire, transfer, promotion, retention or other employment decision. If there is no adverse impact, there is no validation requirement.

It is inappropriate to require validity evidence where the number of persons and the difference in selection rates are so small that the selection of one different person for one job would shift the result from adverse impact against one group to a situation in which that group has a higher selection rate than the other group***.

If the test is administered and used in the same fashion for a variety of jobs, the impact of that test can be assessed in the aggregate. The records showing the results of the test, and the total number of persons selected, generally would be sufficient to show the impact of the test. Again, if the test has no adverse impact, it need not be validated.

Adverse impact

This topic is dealt with more thoroughly in a separate Appendix. In order to establish whether there is a requirement to collect validity evidence, the adverse impact of the whole assessment process should first be established. If there is no adverse impact for the overall assessment process, in most circumstances there is no obligation to investigate adverse impact of the component psychometric tests.

If the overall assessment process does have an adverse impact, the adverse impact of the individual components of the assessment procedure (i.e. the individual psychometric test being employed) should be analyzed. If the test user continues to use a psychometric test in a situation where there is adverse impact, the user is expected to have evidence of validity, as described above.

Who should conduct validation procedures?

Many industrial/ organisational/ business psychologists validate selection procedures, review published evidence of validity and make recommendations with respect to the use of selection procedures. In the US many of these individuals are members or fellows of Division 14 (Industrial and Organizational Psychology) or Division 5 (Evaluation and Measurement) of the American Psychological Association. They can be identified in the membership directory of that organization. A high level of qualification is represented by a diploma in Industrial Psychology awarded by the American Board of Professional Psychology. Similarly, in the UK they will be members of the British Psychological Society’s Division of Occupational Psychology.

Individuals with the necessary competence may come from a variety of backgrounds. The primary qualification is pertinent training and experience in the conduct of validation research.

Industrial psychologists and other persons competent in the field may be found as faculty members in colleges and universities (normally in the departments of psychology or business administration) or working as individual consultants or as members of a consulting organization.

* A rank ordering matching technique has been suggested as a way of dealing with small sample sizes. Using this approach criterion validity is established by rank ordering employees by their work performance on the criterion, then rank ordering the same employees on the psychometric test. A coefficient is then calculated to determine the degree of match. For example, using this approach we might expect the coefficient to be high between management accountants of varying ability and their scores on a high level test of numerical analysis, but zero if they were given graphology analysis.
** An example of this where, for example, the OPQ is being considered for use, would be to compare and contrast profiles of three groups – those who on an objective measure of job performance are deemed to be high performers; those who are deemed to be poor performers and those whose performance is merely average. This can help establish discriminating characteristics between the high and average performers, as well highlight those characteristics where high and low performers can be confused. Whatever High Performance Profile emerges can than be used as a useful template for training needs analysis.
*** For example, if the employer selected three males and one female from an applicant pool of 20 males and 10 females, the 4/5ths rule would indicate adverse impact (selection rate for women is 10%; for men 15%; 10/15 or 662/3% is less than 80%), yet the number of selections is too small to warrant a determination of adverse impact. In