Sahil Narula, Sanaa Karkera, Rushil Challa, Sarina Virmani, Nithya Chilukuri, Mason Elkas,
Nidhi Thammineni, Ankita Kamath, Parth Jaiswal and Abhishek Krishnan Duke University United States
MLA 8 Narula, Sahil, et al. "TESTING THE ACCURACY OF MODERN LLMS IN ANSWERING GENERAL MEDICAL PROMPTS." Int. j. of Social Science and Economic Research, vol. 8, no. 9, Sept. 2023, pp. 2793-2802, doi.org/10.46609/IJSSER.2023.v08i09.021. Accessed Sept. 2023.
APA 6 Narula, S., Karkera, S., Challa, R., Virmani, S., Chilukuri, N., Elkas, M., & Thammineni, N. (2023, September). TESTING THE ACCURACY OF MODERN LLMS IN ANSWERING GENERAL MEDICAL PROMPTS. Int. j. of Social Science and Economic Research, 8(9), 2793-2802. Retrieved from https://doi.org/10.46609/IJSSER.2023.v08i09.021
Chicago Narula, Sahil, Sanaa Karkera, Rushil Challa, Sarina Virmani, Nithya Chilukuri, Mason Elkas, Nidhi Thammineni, Ankita Kamath, Parth Jaiswal, and Abhishek Krishnan. "TESTING THE ACCURACY OF MODERN LLMS IN ANSWERING GENERAL MEDICAL PROMPTS." Int. j. of Social Science and Economic Research 8, no. 9 (September 2023), 2793-2802. Accessed September, 2023. https://doi.org/10.46609/IJSSER.2023.v08i09.021.
References [1]. Brown, T. B., et al. "Language Models are Few-Shot Learners." arXiv preprint arXiv:2005.14165, 2020.
[2]. Fagherazzi, G., et al. "The Digital Health Paradox: Direct-To-Consumer Health Technologies and Medical Misinformation." npj Digital Medicine, 2020.
[3]. Eysenbach, G., Powell, J., Kuss, O., & Sa, E. R. "Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review." Journal of the American Medical Association, 2002.
[4]. Flexner, A. "Medical Education in the United States and Canada." Bulletin Number Four (The Flexner Report), 1910.
[5]. Hripcsak, G., & Rothschild, A. S. "Agreement, the f-measure, and reliability in information retrieval." Journal of the American Medical Informatics Association, 2005.
[6]. Mittelstadt, B., & Floridi, L. "The Ethics of Big Data: Current and Foreseeable Issues in Biomedical Contexts." Science and Engineering Ethics, 2016.
[7]. Litjens, G., et al. "A survey on deep learning in medical image analysis." Medical image analysis, 2017.
[8]. Raghupathi, W., & Raghupathi, V. "Big data analytics in healthcare: promise and potential." Health information science and systems, 2014.
[9]. Esteva, A., et al. "A guide to deep learning in healthcare." Nature Medicine, 2019.
[10]. Thompson, W., et al. "Large Language Models in Healthcare: A Preliminary Study on Information Accuracy and Safety." Journal of Medical Internet Research, 2022.
[11]. Dietterich, T. G. "Overfitting and undercomputing in machine learning." ACM Computing Surveys, 1995
ABSTRACT: The rising use of large language models (LLMs) for answering medical questions necessitates an
evaluation of their accuracy, especially given the implications for public health. This study
employed a comprehensive test suite of 500 medical prompts, evaluated by a panel of medical
experts for factual accuracy, contextual relevance, and potential risk. The responses from state of-the-art LLMs were also compared with answers from a control group of medical students.
Results indicated a high level of accuracy among LLMs, with a median score of 88%. While
LLMs performed well on general wellness questions (92% accuracy), they were less reliable for
specialized medical queries (80% accuracy). The control group of medical students outperformed
LLMs in answering specialized medical questions. In conclusion, while LLMs demonstrate a
high degree of factual accuracy for general medical information, they are less reliable for
specialized or complex health-related queries. Given their widespread use, LLMs could be a
preliminary source for general medical advice, but their limitations underscore the need for
consulting experts for specialized medical conditions. Future work should focus on enhancing
the models' capabilities in specialized domains and evaluating the ethical implications of using
LLMs for medical information dissemination. This study serves as a baseline for the responsible
use of AI in healthcare.
The International Journal of Social Science and Economic Research Inviting Papers/Articles for Upcoming Issue Volume 9 No. 11 November 2024.
Submit your Paper through Online Submission System. Authors also can Send Paper to
submit@ijsser.org If you need any help contact us +91-9753980183