Using imputation to enrich your data

Why understanding the business objective and your data is key

The author(s)
  • Chris Moore Advanced Analytics, UK
Get in touch

Using imputation to enrich your dataImputation is used for data enrichment. Its goal being to enhance a data source by ensuring each observation, e.g., respondent, has a complete set of data for each variable of interest, all whilst preserving the original structure of the data. 

The area has gained much traction in recent years thanks to the popularity of programming languages such as R and Python. That said, the usage of imputation has been common for many years and is typically used to be able to conduct methods such as regression analysis when there is an incomplete data source.

The use of data imputation has become more prevalent in the research industry as a way of providing additional insight. While the objective, i.e., having a full and complete data source, is seemingly straight-forward there are many nuances to be wary of.

Whether to use item or unit imputation, single or multiple imputation, and the type of missing data are just a few of the questions that need to be considered. 

Data imputation offers great opportunities to enrich data, but requires skilful navigation through a maelstrom of risk and technical challenges as there is no one size fits all approach.

The area of data imputation is part of a wider field known as data integration. This term is used to describe a multitude of methodologies that enable us to enrich data.

The exact method to use ultimately depends on the business question, the purpose of the integration (descriptive analysis or modelling), and the data source(s) available.

This paper will discuss Data Imputation and how it is used to solve the issue of missing observation-level data, different use cases for imputation, the fundamentals of conducting a successful imputation, and common pitfalls to be aware of.
 

The author(s)
  • Chris Moore Advanced Analytics, UK