Harvineet Singh

University of California, San Francisco

I am a postdoctoral researcher at the University of California, San Francisco in Prof. Jean Feng’s group. I work on machine learning models for healthcare applications, in particular developing methods to explain and audit model performance across varying populations. Before this, I received a PhD in Data Science from New York University, advised by Prof. Rumi Chunara.

My research aims to understand the challenges in responsible deployment and evaluation of machine learning systems. With that goal, I develop methods in causal inference, algorithmic fairness, and interactive learning, motivated by applications in personalized and population health.

Previously, I was a Summer Fellow at Harvard University’s Center for Research on Computation and Society, and an intern at Amazon Science and Microsoft Research. I worked as a research engineer at Adobe Research before joining graduate school, where I built recommendation tools for data analysts. I completed my Integrated Masters from Indian Institute of Technology Delhi in Mathematics and Computing, where I was fortunate to be mentored by Prof. Amitabha Bagchi and Prof. Parag Singla.

news

Nov 21, 2023	Talk at the Future of AI in Medicine seminar at UCSF on fair ML in health.
Oct 28, 2023	Guest lecture in Prof. Shalmali Joshi’s class on Advanced Machine Learning for Health and Medicine at Columbia University.
Jul 6, 2023	Joined Prof. Jean Feng’s lab at UCSF as a postdoc!
Apr 17, 2023	Defended PhD thesis at NYU Center for Data Science
Apr 7, 2022	Talk at Future Leaders Summit at UMich on Responsible data science and AI. Blogpost on the talk.

selected publications

ICML
When do Minimax-fair Learning and Empirical Risk Minimization Coincide?

Harvineet Singh, Matthäus Kleindessner, Volkan Cevher, and 2 more authors

In Proceedings of the 40th International Conference on Machine Learning 23–29 jul 2023

Abs Bib PDF

Minimax-fair machine learning minimizes the error for the worst-off group. However, empirical evidence suggests that when sophisticated models are trained with standard empirical risk minimization (ERM), they often have the same performance on the worst-off group as a minimax-trained model. Our work makes this counter-intuitive observation concrete. We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. We provide additional empirical evidence of how this observation holds on a wide range of datasets and hypothesis classes. Since ERM is fundamentally easier than minimax optimization, our findings have implications on the practice of fair machine learning.
@inproceedings{singh2023minimaxfairness, title = {When do Minimax-fair Learning and Empirical Risk Minimization Coincide?}, author = {Singh, Harvineet and Kleindessner, Matth\"{a}us and Cevher, Volkan and Chunara, Rumi and Russell, Chris}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {31969--31989}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, }
ICML
"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts

Haoran Zhang, Harvineet Singh, Marzyeh Ghassemi, and 1 more author

In Proceedings of the 40th International Conference on Machine Learning 23–29 jul 2023

Abs Bib PDF

Machine learning models frequently experience performance drops under distribution shifts. The underlying cause of such shifts may be multiple simultaneous factors such as changes in data quality, differences in specific covariate distributions, or changes in the relationship between label and features. When a model does fail during deployment, attributing performance change to these factors is critical for the model developer to identify the root cause and take mitigating actions. In this work, we introduce the problem of attributing performance differences between environments to distribution shifts in the underlying data generating mechanisms. We formulate the problem as a cooperative game where the players are distributions. We define the value of a set of distributions to be the change in model performance when only this set of distributions has changed between environments, and derive an importance weighting method for computing the value of an arbitrary set of distributions. The contribution of each distribution to the total performance change is then quantified as its Shapley value. We demonstrate the correctness and utility of our method on synthetic, semi-synthetic, and real-world case studies, showing its effectiveness in attributing performance changes to a wide range of distribution shifts.
@inproceedings{zhang2023attributingperformance, title = {"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts}, author = {Zhang, Haoran and Singh, Harvineet and Ghassemi, Marzyeh and Joshi, Shalmali}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {41550--41578}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, }
PLOS Journal
Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database

Harvineet Singh, Vishwali Mhasawade, and Rumi Chunara

PLOS Digital Health 23–29 jul 2022

Abs Bib HTML

Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.
@article{singh2022eicu, doi = {10.1371/journal.pdig.0000023}, author = {Singh, Harvineet and Mhasawade, Vishwali and Chunara, Rumi}, journal = {PLOS Digital Health}, publisher = {Public Library of Science}, title = {Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database}, year = {2022}, }

AIES

Towards Robust Off-Policy Evaluation via Human Inputs

Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, and 1 more author

AAAI/ACM Conference on AI, Ethics, and Society 23–29 jul 2022

Bib HTML

@article{singh2021policy,
  author = {Singh, Harvineet and Joshi, Shalmali and Doshi{-}Velez, Finale and Lakkaraju, Himabindu},
  title = {Towards Robust Off-Policy Evaluation via Human Inputs},
  journal = {AAAI/ACM Conference on AI, Ethics, and Society},
  year = {2022},
}

FAccT
Fairness Violations and Mitigation under Covariate Shift

Harvineet Singh, Rina Singh, Vishwali Mhasawade, and 1 more author

In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency 23–29 jul 2021

Abs Bib HTML

We study the problem of learning fair prediction models for unseen test sets distributed differently from the train set. Stability against changes in data distribution is an important mandate for responsible deployment of models. The domain adaptation literature addresses this concern, albeit with the notion of stability limited to that of prediction accuracy. We identify sufficient conditions under which stable models, both in terms of prediction accuracy and fairness, can be learned. Using the causal graph describing the data and the anticipated shifts, we specify an approach based on feature selection that exploits conditional independencies in the data to estimate accuracy and fairness metrics for the test set. We show that for specific fairness definitions, the resulting model satisfies a form of worst-case optimality. In context of a healthcare task, we illustrate the advantages of the approach in making more equitable decisions.
@inproceedings{singh2021covariate, author = {Singh, Harvineet and Singh, Rina and Mhasawade, Vishwali and Chunara, Rumi}, title = {Fairness Violations and Mitigation under Covariate Shift}, year = {2021}, isbn = {9781450383097}, doi = {10.1145/3442188.3445865}, booktitle = {Proceedings of the ACM Conference on Fairness, Accountability, and Transparency}, keywords = {algorithmic fairness, causal inference, domain adaptation, covariate shift}, location = {Virtual Event, Canada}, series = {FAccT}, }