Wangzhi Xi led a workshop titled Leveraging Large Language Models for Classical Chinese Language Processing at the 15th International Conference of Digital Archives and Digital Humanities (DADH 2024) on November 29, at National Taiwan Normal University.

This workshop explores the practical applications of Large Language Models (LLMs) including both decoder (generative) models and encoder (representative) models in processing classical Chinese, specifically on historical texts and languages. The workshop begins with an introductory overview of Transformer architectures and LLMs, preparing participants for hands-on interaction with BERT models specifically trained in classical Chinese along with open-source generative AI tools. Training data excerpts from the RegInfra’s project corpus of stele inscriptions will illustrate the processing pipeline. We aim to demonstrate how state-of-the-art language technologies can facilitate the information extraction from extensive historical sources, making these services accessible to researchers and students in the humanities.