All Projects

Images / Aurora Samperio

Congressional Record Corpus (CRC)

The Congressional Record Corpus (CRC) contains the entirety of the Congressional Record as published from 1873-2021 by the Government Publishing Office. The Congressional Record is a detailed record of legislative history, including bills introduced, transcripts of floor debate and remarks, conference and committee reports, statements by legislators and other proceedings.

View Project

Courtesy of The History of English

BYU-Corpus of Early Modern English (BYU-COEME)

The BYU-Corpus of Early Modern English cover texts from 1475 – 1800 that were included in the Evans Bibliography, the Early English Books Online (EEBO), Eighteenth Century Collections Online (ECCO) corrected by the Text Creation Partnership (TCP) Evans Bibliography (University of Michigan).

View Project

Corpus of State Conventions on the Adoption of the Constitution (COSCAC)

View Project

Corpus of the Current US Code (COCUSC)

View Project

Corpus of US Caselaw (CUSC)

View Project

Law & Corpus Linguistics — Background

Corpus linguistics is an approach to language research that utilizes a principled collection of texts (i.e., a corpus) in order to better understand patterns of language use. Analysis of these patterns can produce insight into, among other things, the meaning of words and phrases. Linguists (and lexicographers) have long understood that corpora are a vastly superior guide to interpretation than native speaker intuition or even dictionaries. With advances in computer technology, the use of corpus linguistics for research has expanded dramatically. Legal scholars and judges have only recently begun to tap the potential of this method because most are unaware of its possibilities.

View Project

Public Beta Version 3.00

Law & Corpus Linguistics UI

Law & Corpus Linguistics Interface

Build an interface that delivers essential corpus linguistics tools and incorporates more than 20 years of library interface design.

View Project

Stock Montage/Getty Images

Corpus of Founding Era American English (COFEA)

The Corpus of Founding Era American English (COFEA) was original conceived as a set of texts from the period 1760 - 1799.

View Project

U.S. Statutes at Large

Corpus of Early Statutes at Large (CESAL)

US Statutes at Large

View Project

Constitutional Convention of 1787

Corpus of the Records of the Constitutional Convention (CORCC)

Max Farrand's The Records of the Federal Convention of 1787.

View Project

Supreme Court

Corpus of Supreme Court Opinions of the United States (COSCO-US)

United State Supreme Court opinions, including dissents to denials of cert. Future sources to include are briefs, oral argument and enhances descriptive metadata through 2017 .

View Project

Background Readings

Develop readings to help non-linguistics develop some basic and limited familiarity with corpus linguistics and its application.

View Project

Interface Helps

Coming Soon!

View Project

J. Reuben Clark Law School

BYU Photo

Undergraduate Internship

Harness volunteer talent as a force multiplier for the BYU Law School Law & Corpus Linguistics Project.

View Project

Principled Text De-duping

Develop a process for deduping text within any given corpus.

View Project

Optical Character Recognition Confidence Count (OCRCC)

Develop a method to establish and OCR confidence score, based on the sample of principled text.

View Project

Elliot’s Debates

Elliots Debates.  The Debates in the Several State Conventions on the Adoption of the Federal Constitution

View Project

Alternate Spellings (AS)

Develop and approach to deal with variation between 21st Century and previous Centuries variations in spelling.

View Project


Cover page of CFR

Code of Federal Regulations (CFR)

Develop a Corpus using volumes and editions of the Code of Federal Regulations.

View Project