Skip navigation

Category Archives: Data Collection

Article by Stanton, Jeffery M. and Rogelberg, Steven G. (2001) in Organizational Research Methods, 4:3.


This paper addresses three common issues pertaining to data collection challenges using internet/intranet web pages:




(1.1) Constructing and posting materials


(1.2) Controlling access, authentication, and multiple responses

The main challenge in controlling access is how to avoid responses from the unwanted participant (the non-respondents). One way to solve this challenge is by using the organization’s intranet (so that it will reduce the unwanted responses case from outside of the organization where the research is conducted). Another mean is by using authentication, where each respondent being assigned with identifier and password. But for the latter approach, potential respondents must be absolutely convinced that the allocation of identifier and password are done randomly by machine as they may have concern over the issue of anonymity which in turn may reduce the response rate.


The next challenge is multiple responding, where researchers have to prevent multiple responses from the same individual respondent. Either the multiple responding is inadvertent (such as, one may accidentally clicks the submit button twice or mistakenly submit data prematurely and thus resubmit after making a more complete response) or purposeful (such as, respondents may want to skew the finding to a certain direction or to sabotage the research effort), both may result in some opinions being over-represented. To overcome the multiple responding problems, the authors recommended the following actions:

  1. Avoid motivating individuals to engage in malicious response, such as extensive cross-postings of research announcements and frequent e-mail may be perceived by the potential respondents as intrusive
  2. Remind the potential respondents that the researchers need for one and only one response to the research request. In the reminder, emphasize the importance and meaningfulness of the research.
  3. Design the web site in a way that the respondents receive a confirmation query (such as, “Are you sure you want to submit your responses?”) at the moment they submit their data and an immediate acknowledgment message should be given to notice them that data have been received.
  4. Employ the authentication strategies so that data screening for deleting multiple responses can be done prior to data analysis.


(1.3) Encouraging participation in networked research

In order to get high response rate, the authors recommended the following actions:

  1. Provide “advance notice of an e-mail-based study, combined with the opportunity to decline participation in the research”
  2. Use a “mail-merge” strategy to personalize each recruiting message.
  3. Employ some incentive strategies to encourage early response to their Web-based questionnaire instrument.
  4. Provide the respondents with immediate general or personalized feedback concerning their research participation
  5. Facilitate response by maintaining the novelty of online data collection by designing diverse, intriguing, and
  6. Interactive Web pages
  7. Convince the potential respondents with a credible track record (“data from past research efforts have been acknowledged and used responsibly and appropriately”).





(2.1) Bridging the Digital Divide

Researchers have to be aware that not all members of the population have the same access to the internet/intranet technology. Therefore, necessary measures have to be taken so that over-representation of certain sub-groups in the population will not happen. Otherwise, the generalizibility of the findings will be equivocal as it subjected to possibility of nonresponse bias.


(2.2) Method Effects

The internet- or intranet-based research is very much affected by the respondents’ belief on few things: (i) anonymity and confidentiality of their response; (ii) expectations about the purpose of the study; (iii) trustworthiness and reliability of computers and software. If by any chance the respondents have a doubt over these issues, it will in turn reduce the response rate and expose the study to nonresponse error. Consequently, the generalizibility of the findings becomes questionable.


(2.3) Uncontrolled Response Environments

Similar to mailed questionnaire strategy, researchers that use internet or intranet data collection have no idea about the state of the respondents (e.g., they do not know whether the respondents are sleepy, intoxicated, distressed, etc.) as well as the environments where the respondents give their responses (e.g., they do not know whether or not someone else present and influence the respondent, or the hardware and software that they use is really compatible with the data collection system). Researchers have to accept the fact that the generalizibility of the findings of such research has a limitation due to the uncontrolled response environment.


(2.4) Perceived Generalizability

From previous studies, according to the authors, “information from the Internet is perceived as considerably less credible than the traditional sources”. This perception is obviously a great challenge to the internet- or intranet-based research. One way to avoid such perception is by employing the so called “cross validation” technique. [Note: I have explained the cross validation technique in previous entry. Click here for quick reference. ]




The authors also discussed the ethical implications of the internet- or intranet-based research in this paper.


Article by Allen, Gove N.; Burk, Dan L., and; Davis, Gordon B. (2006) in MIS Quarterly, 30:3.


This paper argues that academic researchers SHOULD NOT collects data for their academic research from the commercial website without initially seek a written consent from the website’s administrator. Such websites are maintained for business purposes and strategies, therefore information and facts displayed on the website are appropriate in that particular context only. It may not appropriate for academic research purposes, unless the administrator and the researchers have agreed otherwise.


The authors argued that “…[it] is unacceptable. Such actions show that the researcher is acting in bad faith and will reduce the credibility that other academic researchers hold with commercial web sites. Perhaps more importantly, the circumvention of technological measures to gain access to data clearly violates the Digital Millennium Copyright Act (DMCA) in the United States, as well as provisions set forth in the European Union Copyright Directive (2001/29/EC). Accordingly, such behavior may have criminal consequences…”


This paper reports that there are two main ways to do data collection from websites. I summarize them in a diagram below.


The authors warned academic researchers not to undermine two documents (that usually maintained by websites) which concern on the access and use of facts and information provided. The two documents are Terms of Service Document (TOS) and robots.txt files. Any data collection that is not adheres to the two documents will expose researchers to legal actions, which according to the authors, can be made on three bases:

  1. Trespass – in common law, trespassing claim requires: (i) a contact be made with the property; (ii) notification that contact was unwelcome; (iii) interference with or dispossession of the property, resulting in some harm or damage, or pecuniary loss to the owner. Similarly, data collection from website without any initial consent is considered trespassing as it fulfills the 3 requirements: (i) electrical impulses (where request of certain information being sent to web server) satisfy the common law requirement of physical contact; (ii) terms posted in the TOS document or robots.txt satisfy the requirement of notice; (iii) the increased load on the networked system satisfies the requirement of interference or dispossession, where pecuniary loss is defined on the loss of processing cycles, the use of network transmission capacity, etc.
  2. Copyright – collecting facts (only) does not against the copyright law, but copying the facts with the original selection and arrangement does. The authors argued that normally researchers would capture the whole webpage for record and future analysis, where according to law in some countries like US is prohibited (“…under United States law, copies that are fixed long enough to be perceived … are subject to copyright law…”).
  3. Contract Breech – “The terms posted in TOS documents purport to form a binding contract with those using the site where the TOS document is posted”, so researchers have to ensure that they are not against the TOS document during the data collection.

Website Data Collection

Website Data Collection