Hubbry Logo
search
logo

Bias (statistics)

logo
Community Hub0 Subscribers
Write something...
Be the first to start a discussion here.
Be the first to start a discussion here.
See all
Bias (statistics)

In the field of statistics, bias is a systematic tendency in which the methods used to gather data and estimate a sample statistic present an inaccurate, skewed or distorted (biased) depiction of reality. Statistical bias exists in numerous stages of the data collection and analysis process, including: the source of the data, the methods used to collect the data, the estimator chosen, and the methods used to analyze the data.

Data analysts can take various measures at each stage of the process to reduce the impact of statistical bias in their work. Understanding the source of statistical bias can help to assess whether the observed results are close to actuality. Issues of statistical bias has been argued to be closely linked to issues of statistical validity.

Statistical bias can have significant real world implications as data is used to inform decision making across a wide variety of processes in society. Data is used to inform lawmaking, industry regulation, corporate marketing and distribution tactics, and institutional policies in organizations and workplaces. Therefore, there can be significant implications if statistical bias is not accounted for and controlled. For example, if a pharmaceutical company wishes to explore the effect of a medication on the common cold but the data sample only includes men, any conclusions made from that data will be biased towards how the medication affects men rather than people in general. That means the information would be incomplete and not useful for deciding if the medication is ready for release in the general public. In this scenario, the bias can be addressed by broadening the sample. This sampling error is only one of the ways in which data can be biased.

Bias can be differentiated from other statistical mistakes such as accuracy (instrument failure/inadequacy), lack of data, or mistakes in transcription (typos). Bias implies that the data selection may have been skewed by the collection criteria. Other forms of human-based bias emerge in data collection as well such as response bias, in which participants give inaccurate responses to a question. Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously. Ideally, all factors are controlled and accounted for.

Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested), or from the phenomenon of random errors. The terms flaw or mistake are recommended to differentiate procedural errors from these specifically defined outcome-based terms.

Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: let be a statistic used to estimate a parameter , and let denote the expected value of . Then,

is called the bias of the statistic (with respect to ). If , then is said to be an unbiased estimator of ; otherwise, it is said to be a biased estimator of .

The bias of a statistic is always relative to the parameter it is used to estimate, but the parameter is often omitted when it is clear from the context what is being estimated.

See all
User Avatar
No comments yet.