Why Marshall
Leadership
Dean Geoffrey Garrett
Dean's Cabinet
Boards
Real-World Learning
Human Leadership
Tech Fluency
Global Opportunities
Diversity, Equity and Inclusion
Teaching + Innovation
Experiential Learning Center
Open Expression Statement
Programs
Undergraduate Programs
Admissions
Degrees
BS Business Administration (BUAD)
Business Emphases
BS Accounting (ACCT)
World Bachelor in Business (WBB)
BS Business of Cinematic Arts (BCA)
BS Artificial Intelligence for Business (BUAI)
Undergraduate Minors
Graduate Programs
MBA Programs
Full-Time MBA (FTMBA)
Executive MBA (EMBA)
Part-Time MBA (MBA.PM)
International MBA (IBEAR)
Online MBA (OMBA)
Specialized Masters
MS Business Administration (MSBUSAD)
MS Business Analytics (MSBA)
MS Entrepreneurship + Innovation (MSEI)
MS Finance (MSF)
MS Global Supply Chain Management (MSGSCM)
MS Marketing (MSMKT)
MS Social Entrepreneurship (MSSE)
Master of Business for Veterans (MBV)
Master of Management Studies (MMS)
Accounting Masters
Master of Accounting (MAcc)
Master of Business Taxation (MBT)
Master of Business Taxation for Working Professionals (MBT.WP)
PhD Program
Accounting
Data Sciences + Operations
Finance
Management + Organization
Marketing
Graduate Certificates
GC in Business Analytics
GC in Financial Analysis + Valuation
GC in Management Studies
GC in Marketing
GC in Optimization + Supply Chain Management
GC in Strategy + Management Consulting
GC in Sustainability + Business
GC in Technology Commercialization
GC in Library and Information Management – Online
Executive Education Redirect
Departments
Business Communication (BUCO)
Faculty
Data Sciences and Operations (DSO)
Finance + Business Economics (FBE)
Leventhal School of Accounting (ACCT)
Lloyd Greif Center for Entrepreneurial Studies (BAEP)
Management and Organization (MOR)
Marketing (MKT)
Institutes + Centers
Peter Arkley Institute for Risk Management
Brittingham Social Enterprise Lab
Center for Investment Studies
Initiative on Digital Competition
Randall R. Kendrick Global Supply Chain Institute
Center for Effective Organizations
Lloyd Greif Center for Entrepreneurial Studies
VanEck Digital Assets Initiative
Institute for Outlier Research in Business
Center for Global Innovation
Neely Center for Ethical Leadership and Decision Making
Trojan Network
Recruiting
Undergraduate
Graduate
Career Services
Giving + Support
Alumni Engagement + Resources
Student Organizations
Commencement
Xin Tong's current research interests focus on asymmetric statistical learning, addressing challenges in the Neyman-Pearson classification paradigm, data distortion, sampling bias, asymmetric groups in classification and clustering, and partial knowledge in clustering and community detection. He has published papers in journals that include Science Advances, Journal of American Statistical Association, Journal of the Royal Statistical Society: Series B and the Journal of Machine Learning Research. Professor Tong's research has been partially supported by US NSF and NIH.
Areas of Expertise
RESEARCH + PUBLICATIONS
In prediction-based feature ranking, a group of features would be ordered based on relative importance under a specified prediction objective. This is a universal problem in data-driven decision-making but remains under-explored in the statistics community. Unlike feature se- lection or marginal feature screening, prediction-based feature ranking does not assume the existence of true features or aim to discover them. In this paper, we propose a flexible model- free prediction-based framework for ranking features in a binary classification setting under two paradigms: the classical paradigm and the Neyman-Pearson paradigm. Accordingly, we propose two optimal ranking criteria for the two paradigms: the classical criterion (CC) and the Neyman-Pearson criterion (NPC). Theoretically, sample-level CC and NPC, two easy-to- implement model-free criteria for ranking features in practice, achieve ranking agreements with their population-level counterparts under regularity conditions with high probability. Other prediction objectives, such as the cost-sensitive learning paradigm, are also compatible with the new framework. We illustrate the use of our new framework with a real data study of breast cancer diagnosis, where practical issues such as budget constraints and diagnosis precisions are also considered after we provide a feature rank list. The CC and NPC can be implemented via an R package MarginalFeatureRanking.
A common issue for classification in scientific research and industry is the existence of imbalanced classes. When sample sizes of different classes are imbalanced in training data, naively implement- ing a classification method often leads to unsatisfactory prediction re- sults on test data. Multiple resampling techniques have been proposed to address the class imbalance issues. Yet, there is no general guid- ance on when to use each technique. In this article, we provide an objective-oriented review of the common resampling techniques for bi- nary classification under imbalanced class sizes. The learning objectives we consider include the classical paradigm that minimizes the overall classification error, the cost-sensitive learning paradigm that minimizes a cost-adjusted weighted type I and type II errors, and the Neyman- Pearson paradigm that minimizes the type II error subject to a type I error constraint. Under each paradigm, we investigate the combina- tion of the resampling techniques and a few state-of-the-art classifica- tion methods. For each pair of resampling techniques and classification methods, we use simulation studies to study the performance under different evaluation metrics. From these extensive simulation experiments, we demonstrate under each classification paradigm, the com- plex dynamics among resampling techniques, base classification meth- ods, evaluation metrics, and imbalance ratios. For practitioners, the take-away message is that with imbalanced data, one usually should consider all the combinations of resampling techniques and the base classification methods.