Date of Award


Degree Name

Doctor of Philosophy



First Advisor

Dr. Wei-Chiao Huang

Second Advisor

Dr. Donald Meyer

Third Advisor

Dr. Kevin Lee


Ph.D. Education, machine learning, earnings, graduation rates, OLS regression


This study mainly talks about U.S. PhD education and applications of machine learning methods. On one hand, PhD education, as a way to cultivate future researchers, attracts a lot of attention. On the other hand, machine learning methods are utilized more and more often in academic research. This study, as an attempt, would like to contribute to these two fields. The first essay discusses whether education can reduce and eliminate discrimination. By using the data of PhD recipients’ earning in the U.S., we investigate if PhD education can shrink the earning gaps in three groups: Female vs male, disabled vs non-disabled, foreign vs native-born. Comparing the PhD earning gaps with the average earning gaps, we see that the gender gap of PhDs is larger while the PhD disability and foreign-born gaps are smaller than their corresponding average earning gaps. This shows that education can reduce the discriminations of disability and foreign-born but not of gender. Then, we further examine the sources of discrimination by conducting Blinder-Oaxaca Decomposition. Our results display that the gender and disability of PhD earning gaps are more from the unexplained part, whereas the foreign-born gap is more from the explained part. This indicates that education can eliminate the discriminations of foreign-born, but not of gender and disability.

The second essay examines how factors impact the graduation rates of the U.S. PhD programs. By employing the machine learning method Lasso, we improve independent variable selection by a pre-designed algorithm rather than traditional manual variable selection. By using OLS regression, our results show that supports from faculty, especially from their grants, are very important for doctoral students’ graduation. In addition, diversity like female student rate and international student rate is positive for degree completion. Somewhat surprisingly, higher admitted GRE score brings lower graduation rate, which reminds program administrators to put less weights on GRE for the future admission.

The third essay makes machine learning predictions based on the data of the previous two essays. Our study has two targets: 1) Help prospective doctorates better know their postgraduation earnings, so that they can make decisions whether to enter PhD programs, and 2) Help program administrators improve their programs’ graduation rates. Comparing the results obtained from the linear regression and the machine learning method—random forest—we can judge that which method has a more accurate prediction. Through the simulated examples, we display that how the models can predict for doctorates and programs, so that better help them make decisions and make improvements.

Access Setting

Dissertation-Open Access