Scots and Scottish English Corpora

Historical and present-day Scots and Scottish English corpora

The following lists of Scots and Scottish English language corpora were compiled at the end of 2016. There are three lists in total: Historical Scots, i.e. corpora containing pre-1700 materials; Modern Scots, i.e. corpora containing post-1700 materials; and corpora under construction. None of these lists claim to be exhaustive, nor do they specify the format of each corpus, its degree of linguistic annotation (if any) nor any copyright restrictions that may apply. Such information can be obtained via the URL or email address provided.

If you know of other materials that could usefully be added to this list, please send a brief description to AMC@ed.ac.uk.

Historical Scots (pre-1700)

  • A Linguistic Atlas of Older Scots (LAOS) is an online linguistic atlas that shows what non-literary written Scots was like between about 1380 and 1500. The atlas has been compiled from some 1,200 documentary texts which have been transcribed and linguistically-annotated.
  • The Helsinki Corpus of Older Scots covers the period 1450-1700 and supports studies of the last stages of the differentiation of the northern English dialect, the rise of a distinctive Scottish variety of English and the anglicization of Scots. It is available on CD-ROM from:
  • The records of the Parliaments of Scotland 1424—1707 (with modern translations) are viewable online.
  • The Corpus of Biblical Texts in Scots comprises the texts in Graham Tulloch’s A History of the Scots Bible (Aberdeen University Press, 1989)

Modern Scots (post-1700)

  • Glasgow University’s Corpus of Modern Scottish Writing contains some 350 documents from the period 1700—1945.
  • The Linguistic Atlas of Scotland (3 volumes, available via EUL) is a comprehensive dialectological study of Lowland Scotland, Orkney and Shetland (and Northern Ireland, Northumberland and Cumberland) by the Linguistic Survey of Scotland. It provides a wealth of word-geographical material and phonological findings as well as detailed cartographic analyses of Scottish dialects.
  • A Corpus of Dramatic Texts from Glasgow was compiled in the mid-1980s and comprises a collection of (then) contemporary plays. Access may be granted on written application to its compiler:
    • jk at etinu dot com
  • Katja Lenz compiled a corpus of twelve post-WWII dramatic texts in Scots, which she would be willing to share with interested researchers on written application.
    • katja dot lenz at uni-koeln dot de
  • Glasgow University’s Scottish Corpus of Texts and Speech contains written and spoken samples of post-1940 Scots and Scottish English.
  • The University of Edinburgh’s Phonetics Recording Archive contains recordings made by staff and students from the mid- to late-1900s. Many of the recordings are of Scots and Scottish English speakers.
  • The Sounds of the City project at Glasgow University looks into speech sounds of Glasgow past and present based on its ever-expanding corpus of recordings. The corpus is not publicly available but access may be granted on written application.
  • The University of Edinburgh’s HCRC Map Task Corpus contains 128 digitally-recorded and transcribed unscripted dialogues involving 64 (mostly Glaswegian) undergraduate students at the University of Glasgow.
  • Glasgow University can provide access to two Shetland Scots corpora: one of vernacular Shetland Scots gathered from 30 speakers stratified by age and gender, the other a follow-up corpus of bidialectal data. These corpora are not publicly available but access may be granted on written application to:
    • Jennifer dot Smith at glasgow dot ac dot uk
  • As part of a study of what Polish immigrants do with the variation that exists in the English language around them, speech data were collected from 21 Edinburgh-born and 16 Poland-born adolescents living in Edinburgh.
  • The West Fife High Pipe Band Corpus consists of 38 hours of conversation from a group of 54 speakers. The corpus is not publicly available but access may be granted on written application to:
    • lynn dot clark at canterbury dot ac dot nz
  • Thorsten Brato collected around 40 hours of speech data from c. 100 Aberdonians in 2006/07. Most of the speakers are children and teenagers from different parts of the city and social backgrounds, but there are also recordings with adults. The corpus is not publicly available but access may be granted on written application to:
    • Thorsten dot Brato at sprachlit dot uni-regensburg dot de
  • The Fisher Speak project investigated lexical attrition in five Scottish East Coast fishing communities. Its data — from dictionaries, wordlists and fieldwork — are not publicly available but access may be granted on written application.
    • millar at abdn dot ac dot uk

Corpora Under Construction

  • The FITS project (From Inglis to Scots) will culminate in Spring 2018 with the publication of an online corpus of spelling-sound correspondences for every form of every item of Germanic origin in the LAOS corpus. (The LAOS corpus is described in the ‘Historical Scots’ list above.) The FITS corpus will examine the impressive range of spelling variants and explicate in unprecedented detail the historical development of each one of these form from its pre-Scots etymon.
  • TheICE-Scotland project aims to compile by 2018 a 1-million-word corpus of spoken and written 21st-century Scottish English. The corpus will contain the text categories and annotations specified by the parent project (the International Corpus of English) with additional linguistic annotations such as part-of-speech and phonetic transcriptions.
  • The Aberdeen Corpus of Older Scots (1375-1513, ‘ACOS’) is currently under construction but will provide online access to the Brus and all poems by Dunbar and Douglas. For further information, contact Charles-Henri Discry:
    • c dot h dot discry at uu dot nl
  • The Scots Syntax Atlas aims to provide access to grammaticality judgements and transcribed audio recordings from speakers in 122 locations across Scotland, The project is scheduled to finish in July 2019.