Corpus software

I should do some more work on that. Especially at the rate the corpus is falling apart.

Corpus Structure

The base naming scheme is:

Participants who have not given consent to have their name associated with their data are named:

Those whose interviews are informational only are named either:

These naming schemes will collectively be called BASENAME going forward.

Audio files are named:

Transcripts are named:

Alignments are named:

Data which are extracted from the above are placed into a folder named "extractions". Any extracted data is assigned a standard name which will be denoted by <ID> and always have a txt extension. Extracted data are named:

Currently, the two assigned extraction IDs are:


CategoryBrickhouse

VoicesOfCaliforniaCorpus (last edited 2019-10-04 18:42:38 by ChristianBrickhouse)