Essays in the Application of Machine Learning in Development Economics

Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. Debasri Mukherjee

Second Advisor

Dr. Eskander Alvi

Third Advisor

Dr. Kevin Lee

Fourth Advisor

Dr. Yuekai Sun


This dissertation comprises four essays which apply Machine Learning (ML) techniques to examine India’s progress towards meeting several child-health targets under United Nation’s “Sustainable Development Goals (SDG) 2030”. The application of novel ML techniques unmasks certain detailed empirical aspects of the road to meeting specific indicators of SDG - child mortality, malnutrition, immunization coverage and health-expenses.

The first essay employs multiple parametric and non-parametric Machine Learning (ML) techniques (LASSO, Classification Random Forest, Boosted Logistic Regression, Boosted Classification Trees) to build predictive models for the incidences of neonatal and infant mortality. A large national level household survey dataset is used. All the ML techniques display higher prediction accuracy compared to a standard logistic regression. The consensus from the ML techniques is used to identify a ‘high-mortality risk’ group of mothers and infants who can be the potential beneficiaries of ‘targeted’ public health policies in future.

The second essay investigates multifaceted nature of infant malnutrition in India. A large comprehensive set of covariates from a survey are considered leading to a near high-dimensional setting (with the number of regressors coming closer to the sample size) which necessitates the use of a sparsity-based ML technique. LASSO - a variable selection technique is used to select predictors with strong association with malnutrition and subsequently a post-selection inference (PoSI) technique is applied to conduct hypothesis testing on the selected predictors. The results indicate that while safe drinking water is important in curbing infant malnutrition, many existing government policies are ineffective.

Using state-level data the third essay compares performances of the Indian states in achieving coverage of five essential child vaccines (BCG, DPT, Measles, Polio, Tetanus) under the ‘Universal Immunization Program (UIP)’. The roles of the two policy pillars - (a) funds disbursed by the Central Government to the State Governments under UIP, and (b) the required health infrastructure in each state, are evaluated through the lens of both inference and prediction. Traditional panel regression techniques identify a complementarity between funds and infrastructure. While digging deeper into the questions of complementarity in the aforementioned covariates in various states as well as identifying under-performing states, a comprehensive set of interactions are considered, leading to a near high-dimensional setting. Sparsity-based ML technique (LASSO) is therefore used for variable selection. The results identify certain under-performing states where the policy pillars need to be strengthened.

The fourth essay focuses on three leading causes of morbidity in infants, namely prematurity, jaundice and cardio-respiratory ailments. The role of government sponsored health insurance schemes in mitigating out-of-pocket and overall medical expenses are evaluated using a nationwide survey. A newly developed ML method (Double/de-biased LASSO) is used, which enables us to (a) select important predictors of health expenditure from a vast set of covariates and (b) estimate the ‘treatment effect’ of health insurance on health expenses. While health insurance schemes are found to be effective in mitigating health expenses due to premature birth, no such evidence is found for jaundice and cardio-respiratory diseases in infants.

Access Setting

Dissertation-Abstract Only

Restricted to Campus until


This document is currently not available here.