The Santiago University Learner of English Corpus (SULEC)

The SULEC corpus is a project managed by a group of researchers from SPERTUS (Spoken English Research Team) based at the Department of English and German Philology of the University of Santiago de Compostela. This project was initiated in October 2002 with the funding from the Galician Department of Education. The original aim was to create a learner corpus of at least 1,000,000 words of oral and written samples of learners of English of students of all levels (A1 to C2).

Spoken data were collected through semi-structured interviews, short oral presentations and brief story descriptions, all of which were audio-recorded and then transcribed following some basic conventions. The written component of the corpus was gathered from compositions and argumentative essays following criteria similar to those used in the compilation of ICLE (International Corpus of Learner English). All the data were transcribed and computerised to be finally automatically tagged using Freeling.

The subsequent all-embracing analysis of such data will allow corpus users to conduct research at different levels:

This corpus contains now (November 2022) samples of 1374 students equivalent to 406690 grammatical elements and 365030 words. In September 2022, we intend to start with a second collection of data in line with the guidelines used before but introducing some changes for improvement. We are at the moment looking for teachers of English of all levels who may help us to this end. If you would like to collaborate with us, please contact us using the contact form. Certificates will be issued to all the project participants.