Date of Defense
12-3-2025
Date of Graduation
12-2025
Department
Statistics
First Advisor
Kevin Lee
Second Advisor
Geumchan Hwang
Abstract
The Northwoods League (NWL) is a collegiate summer wooden-bat baseball league that provides college athletes with an opportunity to maintain and enhance their skills during the offseason while competing in game-like conditions. Although prior research has explored player performance in other summer leagues, such as the Cape Cod Baseball League, academic investigation into predictors of draft outcomes in the NWL remains limited. This study addresses this gap by evaluating whether pitch-tracking metrics collected via Trackman systems can predict a pitcher’s likelihood of selection in the Major League Baseball (MLB) draft and by identifying which performance characteristics are most strongly associated with draft probability.
Raw pitch-level data from the NWL spanning 2020-2025, comprising 964,931 observations and 100 variables, were aggregated into pitcher-level career statistics, resulting in a refined dataset of 1,725 pitchers, 60 of whom were drafted. Predictive features captured velocity, spin, movement, release mechanics, usage, and batted ball outcomes. Three complementary machine learning models were developed: logistic regression, Random Forest, and XGBoost. Model performance was assessed using area under the receiver operating characteristic curve (AUC-ROC), sensitivity, specificity, and balanced accuracy. Severe class imbalance (drafted vs. undrafted) was addressed through stratified train-test splitting and class-specific weighting.
Results indicated that logistic regression achieved the highest AUC (0.893), demonstrating strong overall discriminative ability. XGBoost, despite having a slightly lower AUC (0.837), provided sufficient practical draft detection performance when evaluated at some appropriately lowered classification thresholds, although the logistic regression model showed better true/false positive ratios overall for most selection thresholds. Key predictors consistently identified across models included average fastball velocity, steeper approach angles, consistent spin orientations, control within an at-bat, and changeup contact rate. These findings align with conventional scouting priorities and confirm that objective pitch-tracking metrics can complement traditional evaluations.
The study has practical applications for MLB scouting departments and Northwoods League teams, providing a data-driven framework for prioritizing prospects and identifying overlooked talent. Limitations include a small sample of drafted pitchers, lack of differentiation by draft round, and potential changes in scouting emphasis over time. Future work could expand the approach to other collegiate or developmental leagues, incorporate temporal validation, or integrate additional contextual factors such as player age, eligibility, and injury history. Overall, the study demonstrates the utility of Trackman data in predicting draft outcomes and highlights the potential for analytics to augment subjective scouting assessments in professional baseball.
Recommended Citation
Tyrpak, Stephen, "Identifying Draft-Worthy Pitchers: A Statistical Analysis of Northwoods League Performance Metrics" (2025). Honors Theses. 4001.
https://scholarworks.wmich.edu/honors_theses/4001
Access Setting
Honors Thesis-Open Access