top of page

Jessica Witte
Digital Research Analyst
University of Edinburgh
RESEARCH INTERESTS
PROJECT
THISTLE: Technology for Handling and Improving Stubborn Text-Level Errors
THISTLE explores how large language models (LLMs) can be used to improve the quality and accessibility of digitised historical newspapers. Focusing on a nineteenth-century subset of digitised issues of The Scotsman, this project will develop and test a pilot pipeline that uses fine-tuned LLMs to correct complex OCR errors in historical texts. The aim is to produce cleaner, more reliable machine-readable text that is suitable for computational analysis.


1/1
Digitised image of the front page of the first issue of The Scotsman newspaper; Public domain CC0
bottom of page