To gain knowledge about seemingly haphazard situations, statisticians collect information for variables, which describe the situation. But before we differentiate between Descriptive and Inferential Statistics, Lets define common terms in statistics.
Common Terms in Statistics
A variable is a characteristic or attribute that can assume different values.
Data are the values (measurements or observations) that the variables can assume. Variables whose values are determined by chance are called random variables.
Suppose that an insurance company studies its records over the past several years and determines that, on average, 3 out of every 100 automobiles the company insured were involved in accidents during a 1-year period. Although there is no way to predict the specific automobiles that will be involved in an accident (random occurrence), the company can adjust its rates accordingly, since the company knows the general pattern over the long run. (That is, on average, 3% of the insured automobiles will be involved in an accident each year.)
A collection of data values forms a data set. Each value in the data set is called a data value or a datum.
In statistics it is important to distinguish between a sample and a population.
A population consists of all subjects (human or otherwise) that are being studied.
When data are collected from every subject in the population, it is called a census.
For example, every 10 years the United States conducts a census. The primary purpose of this census is to determine the apportionment of the seats in the House of Representatives.
The first census was conducted in 1790 and was mandated by Article 1, Section 2 of the Constitution. As the United States grew, the scope of the census also grew. Today the Census limits questions to populations, housing, manufacturing, agriculture, and mortality. The U.S. Census is conducted by the Bureau of the Census, which is part of the Department of Commerce.
Most of the time, due to the expense, time, size of population, medical concerns, etc., it is not possible to use the entire population for a statistical study; therefore, researchers use samples.
A sample is a group of subjects selected from a population.
If the subjects of a sample are properly selected, most of the time they should possess the same or similar characteristics as the subjects in the population. See Figure 1–1.
Population and Sample
However, the information obtained from a statistical sample is said to be biased if the results from the sample of a population are radically different from the results of a census of the population. Also, a sample is said to be biased if it does not represent the population from which it has been selected. The techniques used to properly select a sample are explained in Section 1–3.
Difference between the two branches of statistics.
The body of knowledge called statistics is sometimes divided into two main areas, depending on how data are used. The two areas are
- Descriptive statistics
- Inferential statistics
Descriptive statistics consists of the collection, organization, summarization, and presentation of data.
In descriptive statistics the statistician tries to describe a situation. Consider the national census conducted by the U.S. government every 10 years. Results of this census give you the average age, income, and other characteristics of the U.S. population. To obtain this information, the Census Bureau must have some means to collect relevant data. Once data are collected, the bureau must organize and summarize them. Finally, the bureau needs a means of presenting the data in some meaningful form, such as charts, graphs, or tables.
The second area of statistics is called inferential statistics.
Inferential statistics consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
Here, the statistician tries to make inferences from samples to populations. Inferential statistics uses probability, i.e., the chance of an event occurring. You may be familiar with the concepts of probability through various forms of gambling. If you play cards, dice, bingo, or lotteries, you win or lose according to the laws of probability. Probability theory is also used in the insurance industry and other areas.
The area of inferential statistics called hypothesis testing is a decision-making process for evaluating claims about a population, based on information obtained from samples. For example, a researcher may wish to know if a new drug will reduce the number of heart attacks in men over age 70 years of age. For this study, two groups of men over age 70 would be selected. One group would be given the drug, and the other would be given a placebo (a substance with no medical benefits or harm). Later, the number of heart attacks occurring in each group of men would be counted, a statistical test would be run, and a decision would be made about the effectiveness of the drug.
Statisticians also use statistics to determine relationships among variables. For example, relationships were the focus of the most noted study in the 20th century, “Smoking and Health,” published by the Surgeon General of the United States in 1964. He stated that after reviewing and evaluating the data, his group found a definite relationship between smoking and lung cancer. He did not say that cigarette smoking actually causes lung cancer, but that there is a relationship between smoking and lung cancer. This conclusion was based on a study done in 1958 by Hammond and Horn. In this study, 187,783 men were observed over a period of 45 months. The death rate from lung cancer in this group of volunteers was 10 times as great for smokers as for nonsmokers.
Finally, by studying past and present data and conditions, statisticians try to make predictions based on this information. For example, a car dealer may look at past sales records for a specific month to decide what types of automobiles and how many of each type to order for that month next year.
Use Example Below to Differentiate Between Descriptive or Inferential Statistics
Determine whether descriptive or inferential statistics were used.
- The average price of a 30-second ad for the Academy Awards show in a recent year was 1.90 million dollars.
- The Department of Economic and Social Affairs predicts that the population of Mexico City, Mexico, in 2030 will be 238,647,000 people.
- A medical report stated that taking statins is shown to lower heart attacks, but some people are at a slightly higher risk of developing diabetes when taking statins.
- A survey of 2234 people conducted by the Harris Poll found that 55% of the respondents said that excessive complaining by adults was the most annoying social media habit.
- A descriptive statistic (average) was used since this statement was based on data obtained in a recent year.
- Inferential statistics were used since this is a prediction for a future year.
- Inferential statistics were used since this conclusion was drawn from data obtained from samples and used to conclude that the results apply to a population.
- Descriptive statistics were used since this is a result obtained from a sample of 2234 survey respondents.