Background Readings

A basic reading list has been developed by Sara White & Brett Hashimoto

Develop readings to help non-linguistics develop some basic and limited familiarity with corpus linguistics and its application.

With the exception of Wayne Schneider, the supporting cast started completely ignorant of linguistics principles and methods.  It quickly became apparent that we would need help from trained linguists to help new team members get up to speed as quickly as possible.  Mark Davies and Bill Eggington presented to the law faculty twice early in the project.  Sara White was hired as a visiting linguistics fellow for the 2017/2018 academic year and developed the first draft. Brett Hashimoto was then hired to replace Sara for the 2019/2020 academic year and revised and updated this list.

General Corpus Linguistics

Biber, D., & Reppen, R. (Eds.). (2015). The Cambridge handbook of English corpus linguistics. Cambridge: Cambridge University Press.

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.

Kennedy, G. (2014). An introduction to corpus linguistics. New York: Routledge.

McEnery, T., & Hardie, A. 2012. Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.

McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced resource book. London: Routledge.

AntConc: A freeware corpus analysis toolkit for concordancing and text analysis.

Laurence Anthony has a whole suite of software devoted to corpus development and analysis that can be found at

WordSmith: Lexical analysis software

LancsBox: A new-generation software package for the analysis of language data and corpora The online interface to Mark Davies’s suite of corpora:

Sketch Engine: Online tool for corpus development and analysis tool with a variety of uses

Other Important Readings


Baker, P. (2006). Glossary of corpus linguistics. Edinburgh University Press.

Corpus Design:

Biber, D. (1993). Representativeness in corpus design. Literary and linguistic computing8(4), 243-257.

Biber, D. (1993). Using register-diversified corpora for general language studies. Computational linguistics19(2), 219-241.

Discourse Analysis:

Baker, P. (2006). Using corpora in discourse analysis. A&C Black.

McEnery, A., & Baker, P. (Eds.). (2015). Corpora and discourse studies: Integrating discourse and corpora. Springer.

Multidimensional Analysis:

Biber, D. (1988). Variation across speech and writing. Cambridge University Press.

Sardinha, T. B., & Pinto, M. V. (Eds.). (2014). Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber (Vol. 60). John Benjamins Publishing Company.

Keyword Analysis:

Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis in language education (Vol. 22). John Benjamins Publishing.

Corpus Statistics:

Gries, S. T. (2009). Quantitative corpus linguistics with R: A practical introduction. Routledge.

Register Studies:

Biber, D., & Conrad, S. (2019). Register, genre, and style. Cambridge University Press.

(Updated 10/8/2019)