There are two “large bucket” categories of data that most researchers work with on a regular basis: quantitative and qualitative data.
In the biomedical sciences, quantitative data is used to provide measurements, calculate change over time, and generally used in raw data gathering. This raw data can then be used as the basis of statistical analyses.
Qualitative data is often thought of as social sciences data because many researchers in the social sciences use surveys and oral responses—in other words, natural language—as the basis of analyses. However, researchers in the sciences often use these same techniques when describing a particular set of data or when mapping data geographically.
Both types of data are used in the sciences, and both can be used as the basis for primary data and secondary data.
When using a variable that can be counted, measured, and given a numerical value, it is considered a type of quantitative data. Quantitative variables can answer the “how” questions: “how many,” “how much,” or “how often.”
Many researchers will also call quantitative data “numerical,” because of its capacity to measure and thus bridge empirical observation with mathematical expression. Because of the relationship between observation and mathematical expression, a researcher uses statistical analyses in experiments to find significant differences that can be replicated using similar methods.
There are two main types of quantitative or numerical data: discrete and continuous.
Discrete data is usually defined as a type of data that can be counted. These data cannot be made more precise, and so they involve integers, or numbers that cannot be made divisible. A classic example of a discrete data type would be a member of a family: you cannot have 1.3 or 4.2 children in a family. Another example might be how many doctor visits one may have in a year.
Continuous data can be divisible into smaller parts using decimal points. Continuous data, when graphed, create a distribution of values on a continuum. A classic example of continuous data is a person’s height.
Both discrete and continuous quantitative data use measures of central tendency (mean, median, mode) and dispersion (Standard Deviation, standard error, Interquartile Range) to measure results. Which measurement a researcher chooses to use is based on the type of data on which a hypothesis is tested.
Qualitative data is defined as variable categories using verbal groupings rather than numbers. Many people tend to confuse qualitative research with qualitative data: qualitative research is the method of collecting data from first-hand observations, interviews, or questionnaires that researchers use to study society using unstructured or semi-structured techniques like those mentioned above. Data is qualitative when the variables in a data set are verbal rather than numerical.
Qualitative data is also called “categorical” data, or data that can be placed into organized categories.
There are two main types of qualitative or categorical data: nominal and ordinal.
Nominal data variables have two or more categories that have “names” and no inherent order to them. For example, gender is a nominal category (female, nonbinary, male). When a variable only has two possible categories, it is called binary or dichotomous data. For example, asking if someone has a driver’s license (yes/ no).
Ordinal data can be places in categories with a clear order or hierarchy. For example, education level has a clear hierarchy (“high school,” Bachelor’s,” “Master’s,” “PhD”).
When analyzing qualitative data, a researcher will use frequency distribution in the form of a pie chart (nominal data), column, or bar chart (nominal or ordinal data).
Primary and secondary data have less to do with the variables used in data analyses and more to do with who generates the data that a researcher uses for analyses.
Primary data is data generated by the researcher for the primary use of the researcher. At a future time, this primary data may transform into secondary data when uploaded into a repository for use by others. Primary data is data used and collected in the moment and is used in current experiments. Because it is up to the researcher/ researcher’s team to collect data, the process takes time and is very involved.
Primary data is largely available in its raw form; thus, it has not been processed or refined. But, because it has not been processed or refined, it is more accurate and reliable.
Secondary data is usually defined as data that someone else has collected. This can come from large healthcare organizations, the government, or other large organizations. It can be used after the fact of collection. Thus, it is data that has already been used in earlier experiments.
Researchers can find such data in internal healthcare systems, data repositories, either specific to one’s field of research or in a more generalist repository, or as part of a publication.