Corpus software
I should do some more work on that. Especially at the rate the corpus is falling apart.
Corpus Structure
The base naming scheme is:
SITE_<Last>_<First>
Participants who have not given consent to have their name associated with their data are named:
SITE_Confid_<###>
Those whose interviews are informational only are named either:
INF_<Last>_<First>
INF_Confid_<###>
These naming schemes will collectively be called BASENAME going forward.
Audio files are named:
- BASENAME.wav
Transcripts are named:
- BASENAME.trs
- BASENAME.eaf
Alignments are named:
BASENAME.TextGrid
Data which are extracted from the above are placed into a folder named "extractions". Any extracted data is assigned a standard name which will be denoted by <ID> and always have a txt extension. Extracted data are named:
BASENAME_<ID>.txt
Currently, the two assigned extraction IDs are:
- BASENAME_formants.txt
- BASENAME_laughter.txt