Reflections on evaluating the utility of an LLM for keywording health research
Main Article Content
Abstract
Generative artificial intelligence shows promise for rapidly annotating collections of research at scale. We reflected on our experiences of evaluating the ability of a Large Language Model (LLM) to apply predefined keywords to records of health research in the context of an evidence repository on vaccine research and research registers of health promotion effectiveness. Five aspects of evaluation helped us articulate key considerations and challenges across the use cases: 1) cyclical prompt development, 2) data availability and quality, 3) performance benchmarks and expectations, 4) task complexity and perspective and 5) workflows and tools.
Article Details
Issue
Section
Feature Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.
JEAHIL is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) licence, unless otherwise stated. Please read our Policies page for more information on Open Access, copyright and permissions.
How to Cite
1.
Reflections on evaluating the utility of an LLM for keywording health research. J Eur Assoc Health Info Libr [Internet]. 2026 Jun. 15 [cited 2026 Jun. 15];22(2):18-23. Available from: https://ojs.eahil.eu/JEAHIL/article/view/724