Law & Corpus Linguistics Interface

Current Status:

Version 5.3.0 Public Beta now live on lawcorpus site

What’s in a number? . . . Versioning explained

Since we are constantly trying to improve the research experience, a new platform enumeration has been implemented.

Major releases are reflected in the initial whole number (e.g. 3.xx). The first decimal position (tens column) (e.g x.0x) records bug fixes and minor interface upgrades to the current major release. The second decimal position (hundreds columns) (e.g x.x0) reflects upgrades to the underlying principled text of a corpus. In version 3.00, we have attempted to correct any OCR errors that could be corrected through batch processes. When words that break across lines are combined the total word count of a corpus typically decreases, but search results improve.

Version 3.0 Public Beta Release January 2019

The initial releases of the Law & Corpus Linguistics research platform were focused on a guided search process. Users selected the Corpus they wanted to search, entered a word or phrase and were taken to a series of tabs and associated bread crumbs that were intended to guide researchers through a process.

After searching, users would see a frequency list of matches and then could drill down into concordance lines, and then into individual full context view. Users who wanted to do collocate analysis would click on the collocate tab to see an additional search box for additional word entry.

Version 3 attempts to include all of the features of previous releases with a simplified interface. It also develops the compare function that should make it easier to compare law related corpora within the same research session.

Summary of Development:

Consistent Search Bar for search/analysis modes
Modes for search/analysis moved from tabs to toolbar commands
Frequency mode now includes “Document Counts” as measure of token distribution
Collocates mode always available from initial search
Enhancement to compare
Mixed modes
Mixed Corpora
Collocate Compare with Ratios and Score [NOTE: MI score again has been defaulted to 3]
“Builder” distributed throughout interface and advanced options/downloads tool

Minimum: Mutual Information Score Control Now Working Correctly

In Version the 2.1, in the Collocate Tab, the minimum MI was defaulted to the number 3. This decision was made to help researchers using the wildcard feature initially see only results that likely had more than a random relationship to each other. Unfortunately, the control malfunctioned, failed to display the default value of 3, and refused to return any value less than 3. As of December 3, 2018, the control is working correctly and the default has been removed.

For those who used the interface before the control was repaired, it is suggested that cached data for the lawcorpus.byu.edu site be refreshed.

For basic instructions with for various browsers and operating systems, there is a helpful thread on the FILECLOUD Blog, “Tech tip: How to do hard refresh in Chrome, Firefox and IE?”

Open Public Beta Version 2.1 Released
Version 2.1 was released on June 21, 2018. It included three new corpora (Farrand’s Proceedings, Elliott’s Debates, and US Statutes at Large), each can be search as an individual corpus, but are also included with COFEA. The ability to download screen views, bug fixes for navigation, and some underlying architecture for authentication.

Goal:
Develop Research Platform that supports Law & Corpus Linguistics including standard Corpus Linguistics research features, such as text coded by parts of speech, concordance lines, frequency counts, filters by sections. Where possible allow linking back to the original source, but prevent harvesting of licensed content.

Background:
Initially conceived as a five-year project. Howard W. Hunter Law Library librarians (Wayne Schneider, Shawn Nevers & David Armond) partnered with designers, engineers, and programmers from BYU’s Harold B Lee Library Discovery Systems. Curtis Thacker and Charles Draper from BYU were joined by professional designer Grant Zabriskie. To develop an interface that would support academic researchers, students, practicing lawyers and judges. BYU Law’s visiting Linguistics fellow Sara White and visiting professor James Philips developed interface requirements and tested early prototypes. White’s experience working as a graduate student with both Mark Davies (BYU) and Jesse Egbert (North Arizona) provided invaluable assistance to Lee Library developers. Philip’s identification of accessible digital collections and supervision of initial harvesting provided the initial a set of text delivered as COFEA. Shawn Nevers worked with legal information vendors to secure test data sets. Wayne Schneider assisted with data harvesting, text tagging, and developed the first functional BYU Law Corpus. His background in computer science combined with formal coursework in corpus linguistics insured that the project continued to move forward.