Jongmin Mun is a PhD Candidate in the Data Sciences and Operations Department at the University of Southern California. His research focuses on using bandit algorithms to address problems at the intersection of statistics and optimization, including high-dimensional clustering and dynamic pricing. He is advised by Professor Yingying Fan and Professor Paromita Dubey.
Before starting my PhD, he studied:
Education
Areas of Expertise
Programs
Departments
RESEARCH + PUBLICATIONS
We study pricing policy learning from batched contextual bandit data under market
shift and privacy protection. Market shift is modeled as covariate shift, where
the relationship among treatments, features, and rewards remains invariant, while
privacy is enforced through local differential privacy (LDP), which perturbs each
data point before use. Viewing the off-policy setting, covariate shift, and LDP
collectively as forms of distributional shift, we develop a policy learning algorithm
based on a unified pessimism principle that addresses all three shifts. Without
privacy, we estimate the conditional reward via nonparametric regression and
quantify its variance to construct a pessimistic estimator, yielding a policy with
minimax-optimal decision error. Under LDP, we apply the Laplace mechanism and
adjust the pessimistic estimator to account for additional uncertainty from privacy
noise. The resulting doubly pessimistic objective is then optimized to determine
the final pricing policy.
We explore the trade-off between privacy and statistical utility in private two-sample testing under local differential privacy (LDP) for both multinomial and continuous data. We begin by addressing the multinomial case, where we introduce private permutation tests using practical privacy mechanisms such as Laplace, discrete Laplace, and Google's RAPPOR. We then extend our multinomial approach to continuous data via binning and study its uniform separation rates under LDP over H\"older and Besov smoothness classes. The proposed tests for both discrete and continuous cases rigorously control the type I error for any finite sample size, strictly adhere to LDP constraints, and achieve minimax separation rates under LDP. The attained minimax rates reveal inherent privacy-utility trade-offs that are unavoidable in private testing. To address scenarios with unknown smoothness parameters in density testing, we propose an adaptive test based on a Bonferroni-type approach that ensures robust performance without prior knowledge of the smoothness parameters. We validate our theoretical findings with extensive numerical experiments and demonstrate the practical relevance and effectiveness of our proposed methods.
We propose an iterative algorithm for clustering high-dimensional data, where the true signal lies in a much lower-dimensional space. Our method alternates between feature selection and clustering, without requiring precise estimation of sparse model parameters. Feature selection is performed by thresholding a rough estimate of the discriminative direction, while clustering is carried out via a semidefinite programming (SDP) relaxation of K-means. In the isotropic case, the algorithm is motivated by the minimax separation bound for exact recovery of cluster labels using varying sparse subsets of features. This bound highlights the critical role of variable selection in achieving exact recovery. We further extend the algorithm to settings with unknown sparse precision matrices, avoiding full model parameter estimation by computing only the minimally required quantities. Across a range of simulation settings, we find that the proposed iterative approach outperforms several state-of-the-art methods, especially in higher dimensions.
Based on an asymptotically optimal weighted support vector machine (SVM) that introduces label shift, a systematic procedure is derived for applying oversampling and weighted SVM to extremely imbalanced datasets with a cluster-structured positive class. This method formalizes three intuitions: (i) oversampling should reflect the structure of the positive class; (ii) weights should account for both the imbalance and oversampling ratios; (iii) synthetic samples should carry less weight than the original samples. The proposed method generates synthetic samples from the estimated positive class distribution using a Gaussian mixture model. To prevent overfitting to excessive synthetic samples, different misclassification penalties are assigned to the original positive class, synthetic positive class, and negative class. The proposed method is numerically validated through simulations and an analysis of Republic of Korea Army artillery training data.
Autism spectrum disorder (ASD) is an atypical neurodevelopmental condition with a diagnostic ratio largely differing between male and female participants. Due to the sex imbalance in participants with ASD, we lack an understanding of the differences in connectome organization of the brain between male and female participants with ASD. In this study, we matched the sex ratio using a Gaussian mixture model-based oversampling technique and investigated the differences in functional connectivity between male and female participants with ASD using low-dimensional principal gradients. Between-group comparisons of the gradient values revealed significant interaction effects of sex in the sensorimotor, attention, and default mode networks. The sex-related differences in the gradients were highly associated with higher-order cognitive control processes. Transcriptomic association analysis provided potential biological underpinnings, specifying gene enrichment in the cortex, thalamus, and striatum during development. Finally, the principal gradients were differentially associated with symptom severity of ASD between sexes, highlighting significant effects in female participants with ASD. Our work proposed an oversampling method to mitigate sex imbalance in ASD and observed significant sex-related differences in functional connectome organization. The findings may advance our knowledge about the sex heterogeneity in large-scale brain networks in ASD.
Since the 1953 truce, the Republic of Korea Army (ROKA) has regularly conducted artillery training, posing a risk of wildfires — a threat to both the environment and the public perception of national defense. To assess this risk and aid decision-making within the ROKA, we built a predictive model of wildfires triggered by artillery training. To this end, we combined the ROKA dataset with meteorological database. Given the infrequent occurrence of wildfires (imbalance ratio 1:24 in our dataset), achieving balanced detection of wildfire occurrences and non-occurrences is challenging. Our approach combines a weighted support vector machine with a Gaussian mixture-based oversampling, effectively penalizing misclassification of the wildfires. Applied to our dataset, our method outperforms traditional algorithms (G-mean=0.864, sensitivity=0.956, specificity= 0.781), indicating balanced detection. This study not only helps reduce wildfires during artillery trainings but also provides a practical wildfire prediction method for similar climates worldwide.
Current soft neural probes are still operated by bulky, rigid electronics mounted to a body, which deteriorate the integrity of the device to biological systems and restrict the free behavior of a subject. We report a soft, conformable neural interface system that can monitor the single-unit activities of neurons with long-term stability. The system implements soft neural probes in the brain, and their subsidiary electronics which are directly printed on the cranial surface. The high-resolution printing of liquid metals forms soft neural probes with a cellular-scale diameter and adaptable lengths. Also, the printing of liquid metal-based circuits and interconnections along the curvature of the cranium enables the conformal integration of electronics to the body, and the cranial circuit delivers neural signals to a smartphone wirelessly. In the in-vivo studies using mice, the system demonstrates long-term recording (33 weeks) of neural activities in arbitrary brain regions. In T-maze behavioral tests, the system shows the behavior-induced activation of neurons in multiple brain regions.