Quantifying disparities, that is differences in outcomes among population groups, is an important task in public health, economics, and increasingly in machine learning. In this work, we study the question of how to collect data to measure disparities. The field of survey statistics provides extensive guidance on sample sizes necessary to accurately estimate quantities such as averages. However, there is limited guidance for estimating disparities. We consider a broad class of disparity metrics including those used in machine learning for measuring fairness of model outputs. For each metric, we derive the number of samples to be collected per group that increases the precision of disparity estimates given a fixed data collection budget. We also provide sample size calculations for hypothesis tests that check for significant disparities. Our methods can be used to determine sample sizes for fairness evaluations. We validate the methods on two nationwide surveys, used for understanding population-level attributes like employment and health, and a prediction model. Absent a priori information on the groups, we find that equally sampling the groups typically performs well.
@inproceedings{singh2023disparitymeasures,author={Singh, Harvineet and Chunara, Rumi},title={Measures of Disparity and Their Efficient Estimation},year={2023},isbn={9798400702310},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3600211.3604697},doi={10.1145/3600211.3604697},booktitle={Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society},pages={927–938},numpages={12},keywords={and well-being, optimal data collection, Social Sciences, health, AI, fairness metrics, disparity estimation},location={<conf-loc>, <city>Montr\'{e}al</city>, <state>QC</state>, <country>Canada</country>, </conf-loc>},series={AIES '23}}
ICML
When do Minimax-fair Learning and Empirical Risk Minimization Coincide?
Harvineet Singh, Matthäus Kleindessner, Volkan Cevher, and 2 more authors
In Proceedings of the 40th International Conference on Machine Learning 23–29 jul 2023
Minimax-fair machine learning minimizes the error for the worst-off group. However, empirical evidence suggests that when sophisticated models are trained with standard empirical risk minimization (ERM), they often have the same performance on the worst-off group as a minimax-trained model. Our work makes this counter-intuitive observation concrete. We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group. We provide additional empirical evidence of how this observation holds on a wide range of datasets and hypothesis classes. Since ERM is fundamentally easier than minimax optimization, our findings have implications on the practice of fair machine learning.
@inproceedings{singh2023minimaxfairness,title={When do Minimax-fair Learning and Empirical Risk Minimization Coincide?},author={Singh, Harvineet and Kleindessner, Matth\"{a}us and Cevher, Volkan and Chunara, Rumi and Russell, Chris},booktitle={Proceedings of the 40th International Conference on Machine Learning},pages={31969--31989},year={2023},editor={Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},volume={202},series={Proceedings of Machine Learning Research},month={23--29 Jul},publisher={PMLR},}
ICML
"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts
Haoran Zhang, Harvineet Singh, Marzyeh Ghassemi, and 1 more author
In Proceedings of the 40th International Conference on Machine Learning 23–29 jul 2023
Machine learning models frequently experience performance drops under distribution shifts. The underlying cause of such shifts may be multiple simultaneous factors such as changes in data quality, differences in specific covariate distributions, or changes in the relationship between label and features. When a model does fail during deployment, attributing performance change to these factors is critical for the model developer to identify the root cause and take mitigating actions. In this work, we introduce the problem of attributing performance differences between environments to distribution shifts in the underlying data generating mechanisms. We formulate the problem as a cooperative game where the players are distributions. We define the value of a set of distributions to be the change in model performance when only this set of distributions has changed between environments, and derive an importance weighting method for computing the value of an arbitrary set of distributions. The contribution of each distribution to the total performance change is then quantified as its Shapley value. We demonstrate the correctness and utility of our method on synthetic, semi-synthetic, and real-world case studies, showing its effectiveness in attributing performance changes to a wide range of distribution shifts.
@inproceedings{zhang2023attributingperformance,title={"Why did the Model Fail?": Attributing Model Performance Changes to Distribution Shifts},author={Zhang, Haoran and Singh, Harvineet and Ghassemi, Marzyeh and Joshi, Shalmali},booktitle={Proceedings of the 40th International Conference on Machine Learning},pages={41550--41578},year={2023},editor={Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},volume={202},series={Proceedings of Machine Learning Research},month={23--29 Jul},publisher={PMLR},}
2022
preprint
Active Linear Regression in the Online Setting via Leverage Score Sampling
Harvineet Singh, Christopher Musco, and Rumi Chunara
@inproceedings{lobo2022data,title={Data Poisoning Attacks on Off-Policy Policy Evaluation Methods},author={Lobo, Elita and Singh, Harvineet and Petrik, Marek and Rudin, Cynthia and Lakkaraju, Himabindu},booktitle={The 38th Conference on Uncertainty in Artificial Intelligence},year={2022},}
PLOS Journal
Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
Harvineet Singh, Vishwali Mhasawade, and Rumi Chunara
Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation.
@article{singh2022eicu,doi={10.1371/journal.pdig.0000023},author={Singh, Harvineet and Mhasawade, Vishwali and Chunara, Rumi},journal={PLOS Digital Health},publisher={Public Library of Science},title={Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database},year={2022},}
AIES
Towards Robust Off-Policy Evaluation via Human Inputs
Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, and 1 more author
AAAI/ACM Conference on AI, Ethics, and Society 23–29 jul 2022
@article{singh2021policy,author={Singh, Harvineet and Joshi, Shalmali and Doshi{-}Velez, Finale and Lakkaraju, Himabindu},title={Towards Robust Off-Policy Evaluation via Human Inputs},journal={AAAI/ACM Conference on AI, Ethics, and Society},year={2022},}
2021
FAccT
Fairness Violations and Mitigation under Covariate Shift
Harvineet Singh, Rina Singh, Vishwali Mhasawade, and 1 more author
In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency 23–29 jul 2021
We study the problem of learning fair prediction models for unseen test sets distributed differently from the train set. Stability against changes in data distribution is an important mandate for responsible deployment of models. The domain adaptation literature addresses this concern, albeit with the notion of stability limited to that of prediction accuracy. We identify sufficient conditions under which stable models, both in terms of prediction accuracy and fairness, can be learned. Using the causal graph describing the data and the anticipated shifts, we specify an approach based on feature selection that exploits conditional independencies in the data to estimate accuracy and fairness metrics for the test set. We show that for specific fairness definitions, the resulting model satisfies a form of worst-case optimality. In context of a healthcare task, we illustrate the advantages of the approach in making more equitable decisions.
@inproceedings{singh2021covariate,author={Singh, Harvineet and Singh, Rina and Mhasawade, Vishwali and Chunara, Rumi},title={Fairness Violations and Mitigation under Covariate Shift},year={2021},isbn={9781450383097},doi={10.1145/3442188.3445865},booktitle={Proceedings of the ACM Conference on Fairness, Accountability, and Transparency},keywords={algorithmic fairness, causal inference, domain adaptation, covariate shift},location={Virtual Event, Canada},series={FAccT},}
2019
UAI
Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank
Gaurush Hiranandani, Harvineet Singh, Prakhar Gupta, and 3 more authors
@inproceedings{hiranandani2019cascading,title={Cascading Linear Submodular Bandits: Accounting for Position Bias and Diversity in Online Learning to Rank},author={Hiranandani, Gaurush and Singh, Harvineet and Gupta, Prakhar and Kveton, Branislav and Wen, Zheng and Burhanuddin, Iftikhar Ahamath},booktitle={UAI},year={2019},}
UMAP
Stuck? No worries!: Task-aware Command Recommendation and Proactive Help for Analysts
Aadhavan M Nambhi, Bhanu Prakash Reddy, Aarsh Prakash Agarwal, and 3 more authors
@inproceedings{nambhi2019stuck,title={Stuck? No worries!: Task-aware Command Recommendation and Proactive Help for Analysts},author={Nambhi, Aadhavan M and Reddy, Bhanu Prakash and Agarwal, Aarsh Prakash and Verma, Gaurav and Singh, Harvineet and Burhanuddin, Iftikhar Ahamath},booktitle={ACM UMAP},year={2019}}
2018
WSDM
Modeling Time to Open of Emails with a Latent State for User Engagement Level
Moumita Sinha, Vishwa Vinay, and Harvineet Singh
In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining 23–29 jul 2018
@inproceedings{sinha2018time,author={Sinha, Moumita and Vinay, Vishwa and Singh, Harvineet},title={Modeling Time to Open of Emails with a Latent State for User Engagement Level},booktitle={Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining},series={WSDM '18},year={2018},isbn={978-1-4503-5581-0},location={Marina Del Rey, CA, USA},pages={531--539},numpages={9},doi={10.1145/3159652.3159683},acmid={3159683},publisher={ACM},address={New York, NY, USA},keywords={cox-proportional hazards model, email interaction data, enterprise email marketing, survival analysis, time-to-event prediction}}
EDM
Modeling Hint-Taking Behavior and Knowledge State of Students with Multi-Task Learning.
Ritwick Chaudhry, Harvineet Singh, Pradeep Dogga, and 1 more author
International Educational Data Mining Society 23–29 jul 2018
@article{chaudhry2018modeling,title={Modeling Hint-Taking Behavior and Knowledge State of Students with Multi-Task Learning.},author={Chaudhry, Ritwick and Singh, Harvineet and Dogga, Pradeep and Saini, Shiv Kumar},journal={International Educational Data Mining Society},year={2018},publisher={ERIC}}
2015
Springer Journal
On the role of conductance, geography and topology in predicting hashtag virality
Siddharth Bora, Harvineet Singh, Anirban Sen, and 2 more authors
@article{bora2015role,title={On the role of conductance, geography and topology in predicting hashtag virality},author={Bora, Siddharth and Singh, Harvineet and Sen, Anirban and Bagchi, Amitabha and Singla, Parag},journal={Social Network Analysis and Mining},volume={1},number={5},pages={1--15},year={2015}}