Reflections on evaluating the utility of an LLM for keywording health research

Main Article Content

Claire Stansfield
Ailbhe Finnerty Mutlu

Abstract

Generative artificial intelligence shows promise for rapidly annotating collections of research at scale. We reflected on our experiences of evaluating the ability of a Large Language Model (LLM) to apply predefined keywords to records of health research in the context of an evidence repository on vaccine research and research registers of health promotion effectiveness. Five aspects of evaluation helped us articulate key considerations and challenges across the use cases: 1) cyclical prompt development, 2) data availability and quality, 3) performance benchmarks and expectations, 4) task complexity and perspective and 5) workflows and tools.

Article Details

Section

Feature Articles

How to Cite

1.
Reflections on evaluating the utility of an LLM for keywording health research. J Eur Assoc Health Info Libr [Internet]. 2026 Jun. 15 [cited 2026 Jun. 15];22(2):18-23. Available from: https://ojs.eahil.eu/JEAHIL/article/view/724