VoicesOfCaliforniaCorpus

Corpus software

I should do some more work on that. Especially at the rate the corpus is falling apart.

Corpus Structure

The base naming scheme is:

SITE_<Last>_<First>

Participants who have not given consent to have their name associated with their data are named:

SITE_Confid_<###>

Those whose interviews are informational only are named either:

INF_<Last>_<First>
INF_Confid_<###>

These naming schemes will collectively be called BASENAME going forward.

Audio files are named:

BASENAME.wav

Transcripts are named:

BASENAME.trs
BASENAME.eaf

Alignments are named:

BASENAME.TextGrid

Data which are extracted from the above are placed into a folder named "extractions". Any extracted data is assigned a standard name which will be denoted by <ID> and always have a txt extension. Extracted data are named:

BASENAME_<ID>.txt

Currently, the two assigned extraction IDs are:

BASENAME_formants.txt
BASENAME_laughter.txt

CategoryBrickhouse

VoicesOfCaliforniaCorpus (last edited 2019-10-04 18:42:38 by ChristianBrickhouse)