Skip navigation

Category Archives: Reliability

Article by Straub, Detma; Boudreau, Marie-Claude; Gefen, David (2004) in Communications of the Association for Information Systems, 13.


Prior to reading this article, I’ve read two papers written by the same authors, i.e. Straub (1989)[i] and Boudreau, Gefen & Straub (2001)[ii]. I think, this paper is the conclusion of those two earlier papers. The main contribution of this paper is the guideline on what aspect of validation should be included in IS positivist research. The authors rate the requirement (of performing the validation procedures) in three different level of importance, i.e. mandatory, highly recommended, and optional.



All IS positivist researches are required (compulsory) to evident the following aspect of validity:

  1. Construct Validitywhether the measures chosen by the researcher “fit” together in such as way so as to capture the essence of the construct. [Note: you may refer to my previous entry related to construct validity here, or refer to wikipedia (here) for the complete definition]. Note that construct validity consists of four different but inter-related elements, i.e. (i) Discriminant Validity; (ii) Convergent Validity; (iii) Nomological Validity; (iv) Factorial Validity[iii] and; (v) Testing of Common Method Bias[iv]. What are mandatory, according to the authors, are Discriminant Validity and Convergent Validity. Therefore, Factorial Validity is this context is sufficient.
  2. ReliabilityTo prove that measures for one construct are, indeed, related to each other. It is worth to note that reliability works only for reflective construct (never perform reliability on formative construct as its measures are not correlated with each other). [Note: Please refer to my earlier entry on article by Petter, Straub and Rai (2007) for further details].
  3. Manipulation Validity – it is mandatory for certain types of (lab) experimental study only. Experiment that requires participant to be treated with physical substance (such as drug) is not required to prove the manipulation validity.
  4. Statistical Conclusion Validity Researchers need to provide sound arguments on the quality of the statistical evidence of covariation, such as sources of error, the use of appropriate statistical tools, and bias.


Highly Recommended

It is highly recommended that positivist research perform the following aspects of validation:

  1. Testing for Common Method BiasCommon Methods Bias can be avoided by gathering data for the independent variables and dependent variables from different methods, or, if a single method is used, to test it through SEM.
  2. Nomological Validity – The evidence that the structural relationships among variables/constructs is consistent with other studies that have been measured with validated instruments and tested against a variety of persons, settings, times, and, methods.
  3. Manipulation Validity – for quasi-experimental or non-experimental study in social (and design) science where characterizes a great deal of management research, researchers have to prove that participants were truly received the treatment.



It is optional that positivist research perform the following aspects of validation:

  1. Predictive Validity – “Also known as “practical,” “criterion-related,” “postdiction,” or “concurrent validity,” predictive validity establishes the relationship between measures and constructs by demonstrating that a given set of measures posited for a particular construct correlate with or predict a given outcome variable.”
  2. Unidimensional Validity Evidence that shows each measurement item reflects one and only one latent variable (construct). The terms frequently used to discuss this validity are: “first order factors,” “second order factors,” etc. According to the authors, this type of validity is relatively new and the understanding on its capabilities is currently (still) very much limited.


The authors also made the following recommendations pertaining to the innovation of research instruments: 

  1. Researchers are highly recommended to use previously validated instruments wherever possible. If researchers make significant alterations in validated instruments, they are required to revalidate the instrument’s content, constructs, and reliability.
  2. For those who are able to create their own instrument, they’re highly recommended to do so provided that they are required to validate it thoroughly.


[i] Straub, D. W. (1989) “Validating Instruments in MIS Research,” MIS Quarterly, 13:2, pp. 147- 169.

[ii] Boudreau, M., D. Gefen, and D. Straub (2001) “Validation in IS Research: A State-of-the-Art Assessment,” MIS Quarterly, 25:1, pp. 1-23.

[iii] Factorial validity can be assessed using factor analytic techniques such as common factor analysis, PCA, as well as confirmatory factor analysis in SEM. It can assess both convergent and discriminant validity, but does not provide evidence to rule out common methods bias when the researcher uses only one method in collecting the data.

[iv] Common Method Bias is also known as “method halo” or “methods effects”. It may occur when data are collected via only one method or via the same method but only at one point in time. Data collected in these ways likely share part of the variance that the items have in common with each other due to the data collection method rather than to: (i) the hypothesized relationships between the measurement items and their respective latent variables, or; (ii) the hypothesized relationships among the latent variables.




Article by Henson, Robin K. (2001) in Measurement and Evaluation on Counseling and Development, 34.


 [Note: This entry has been ammended on September 8, 2008 – 12:25am.]


Last week I presented a paper at the International Accounting and Business Conference (IABC 2008) which was held on 18-19th August at the Puteri Pacific Johor Bahru, Johor, Malaysia. During the Q&A session, one of the audiences asked my opinion on the issue of validity. He asked whether or not we should estimate the validity of an instrument (he meant a questionnaire) if we simply take it from previous studies where it has been validated many times. He also asked whether or not the ‘cronbach alpha’ (what he meant here is actually the internal consistency reliability test) is sufficient to support the validity aspect of such instrument. I put the summary of my answers below:

  1. When we adopt questionnaire from other studies and we assume that the validity has been proven, what we assume here is actually the CONTENT VALIDITY. It means that, the items in the instrument are very well supported by the (related) theory.
  2. Beside content validity, there are few other aspects of validity that we have to prove. For example, we have to prove that the respondents perceive the questions (in the questionnaire) in the way we want, which means, when we ask them about ‘A’, we have to ensure that the respondents understand that the question is exactly about ‘A’, not about anything else. What we try to prove here is called ‘CONSTRUCT VALIDITY’ or ‘MEASUREMENT VALIDITY’.
  3. CONSTRUCT validity consists of two components, i.e. CONVERGENT validity and DISCRIMINANT validity. Convergent validity proves that all the items (a.k.a. measurements or measured variables) are correctly measure the designed construct (a.k.a. latent variable or unobserved variable), while discriminant validity proves that none of the item measures other construct. (I prepare a diagram to distinguish the two components of construct validity in Figure A at the end of this entry). One way to estimate the construct validity is through the so-called Factorial Validity such as Confirmatory Factor Analysis (CFA). By conducting CFA, we’ll get the structure of constructs with its measures (items) that fulfill the requirement of discriminant and convergent validity.
  4. Cronbach Alpha is an internal consistency test which measures the degree of which the items (measurements) consistently measure the underlying latent construct. It is an indicator of RELIABILITY. The difference of reliability and convergent validity is that, reliability looks into one individual construct at one time while convergent validity look at individual construct with comparison to other constructs in the proposed nomological network. It means that Cronbach Alpha is estimating the convergent validity. Having this (the Cronbach Alpha) in hand, we have proven the reliability but not the construct validity. We have to prove the other one, the discriminant validity. So Cronbach Alpha alone is not sufficient!
  5. I remind myself that what we should estimate here[i] is actually the validity of the ‘score’ or ‘measurement’, and NOT the ‘questionnaire’ or the ‘instrument’[ii]. That explains the reason why we should examine the validity although the instrument has been validated many times before.


Another audience asked me about the stage where internal consistency test should be done. She is currently at the data analysis stage of her PhD works. She asked my opinion whether or not she should perform the Cronbach Alpha twice, one before the Factor Analysis and the other one after that. I put the summary of my answers below:

  1. We perform the (first) internal consistency test (the Cronbach Alpha) to detect whether or not all the items are in a single conceptual direction. If the result of Cronbach Alpha indicates that they are not so, than recoding has to be made accordingly.
  2. After we done with the (first) Cronbach Alpha, we perform Factor Analysis to estimate the construct validity. At the end of this step, we’ll have the structure of constructs in our study (the structure depicts which items indicate which construct).
  3. We need to perform Cronbach Alpha again to support the reliability aspect of the new structure obtained in the Factor Analysis result. If the structure obtained is identical with the one prior to factor analysis, then the (second) cronbach alpha is optional.


[Note: What I have written above is not from the paper, they are all from me.]


The important points that I got from this paper are listed below (Note: there are many other points discussed in this paper but I do not want to include them here simply because of I didn’t perceive them as interesting):

  1. Many researchers, according to the author, have misconception about reliability test. They perceived reliability should be focusing on the tests, rather than to scores (data). It’s indeed a wrong concept! The author wrote “…it is more appropriate to speak of the reliability of ‘test scores’ of the ‘measurement’ than of the ‘test’ or the ‘instrument…’.
  2. As written by the author “Different samples, testing conditions, and any other factor that may affect observed scores can in turn affect reliability estimates…”, reliability test should be done although instrument that we used had already been validated many times before.
  3. Three sources of measurement error (within the classical framework) are: (i) content sampling of items (“..the theoretical idea that the test is made up of a random sampling of all possible items that could be on the test….”); (ii) stability across time; (iii) inter-rater error.
  4. Poor score reliability will reduce the power of statistical significance test, where it becomes harder to find.
  5. The author suggested the so-called “reliability generalization (RG) studies”. The author described RG as “…the cumulative information they may yield in describing study characteristics that affect reliability estimates for scores from a given test and perhaps, study characteristics that consistently affect score reliability across different tests..”

[i] By ‘here’ I mean ‘construct validity’.

[ii] This is one of the core arguments of this paper.