Potential to perpetuate social biases in health care by Chinese large language models: a model evaluation study - International Journal for Equity in Health
Background Large language models (LLMs) may perpetuate or amplify social biases toward patients. We systematically assessed potential biases of three popular Chinese LLMs in clinical application scenarios. Methods We tested whether Qwen, Erine, and Baichuan encode social biases for patients of different sex, ethnicity, educational attainment, income level, and health insurance status. First, we prompted LLMs to generate clinical cases for medical education (n = 8,289) and compared the distribution of patient characteristics in LLM-generated cases with national distributions in China. Second, New England Journal of Medicine Healer clinical vignettes were used to prompt LLMs to generate differential diagnoses and treatment plans (n = 45,600), with variations analyzed based on sociodemographic characteristics. Third, we prompted LLMs to assess patient needs (n = 51,039) based on clinical cases, revealing any implicit biases toward patients with different characteristics. Results The three LLMs showed social biases toward patients with different characteristics to varying degrees in medical education, diagnostic and treatment recommendation, and patient needs assessment. These biases were more frequent in relation to sex, ethnicity, income level, and health insurance status, compared to educational attainment. Overall, the three LLMs failed to appropriately model the sociodemographic diversity of medical conditions, consistently over-representing male, high-education and high-income populations. They also showed a higher referral rate, indicating potential refusal to treat patients, for minority ethnic groups and those without insurance or living with low incomes. The three LLMs were more likely to recommend pain medications for males, and considered patients with higher educational attainment, Han ethnicity, higher income, and those with health insurance as having healthier relationships with others. Interpretation Our findings broaden the scopes of potential biases inherited in LLMs and highlight the urgent need for systematic and continuous assessments of social biases in LLMs in real-world clinical applications.