Stata System Variables
Stata System Variables are details about the current operating context of Stata that are exposed to users through 'underscore variables'. These variables are entirely context sensitive so precautions need to be taken when programming around them.
System Variables
_N stores the number of cases in the current data set. This is affected by commands that operate on a subset of data, such as any command gated by a by block.
_n refers to a case's ordered number in the current data set. This is similarly affected by commands that operate on a subset of data.
To create an 8-digit unique identifier number, try:
generate double UniqueID = 10000000 + _n
These two can be combined in a simple deduplication implementations:
sort KEYVAR by KEYVAR: generate dup=cond(_N==1,0,_n)
_rc stores the return code of the last command or program. This would commonly be accessed like:
capture noisily assert dup==0 if (_rc!=0) { display "There are duplicates!" }
Statistical programmers should be advised that Stata follows the POSIX standard for return values: 0 indicates success, any other integer value indicates an error.
Model Variables
For the most-recent model, several system variables are stored:
_b[VAR] is the coefficent for VAR
_coef[VAR] is a reference to _b[VAR]
_se[VAR] is the standard error for VAR
_cons is 1 whenever accessed directly, but is variable when accessed indirected (as through _b[_cons])
In the context of multiple-equation models, an additional bit of syntax is necessary to indicate the equation number. This can either be specified in brackets preceding the system variable ([#2]_b[VAR]), or inside the brackets preceding the variable specification (_b[#2:VAR]). If an equation number is not specified, #1 is implied. In the context of a single-equation model, #1 is the only valid reference and generally is not specified.
There are several aliases enabled by this syntax. All of the following are equivalent.
_b[VAR] _coef[VAR] [#1]_b[VAR] [#1]_coef[VAR] [#1][VAR] _b[#1:VAR] _coef[#1:VAR]