Differences between revisions 2 and 3
Revision 2 as of 2025-04-04 01:03:04
Size: 1553
Comment: Standardize
Revision 3 as of 2025-06-10 16:27:13
Size: 2514
Comment: Content
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
General use is:
Line 24: Line 26:
For datasets such as the demo datasets, the panel variable and time variable are preset so `xtset` can be called without any arguments.



=== Variable formats ===

[[Stata/DataFormats#Date_and_Datetime_Formats|Date and time formats]] do influence the output of panel commands. Consider:
Line 26: Line 36:

. xtset pid tod

Panel variable: pid (unbalanced)
 Time variable: tod, 1.449e+12 to 1.449e+12, but with gaps
         Delta: 1 unit
Line 36: Line 52:
---- Equivalent to using the `format` command is using the '''`format`''', i.e. `xtset pid tod, format(%tc)`.
Line 40: Line 56:
== Balanced Data == === Balanced data ===
Line 42: Line 58:
'''Balanced''' and '''unbalanced''' refers to whether each entity has a measurement in each time period. A common issue is misspecifying the time units (e.g., specifying a wave for panel data that had cohorts added and subtracted across waves). '''Balanced''' and '''unbalanced''' refers to whether each panel unit has a measurement for each time period. Compare the above unbalanced datasets to:
Line 44: Line 60:
This issue can be 'corrected' using `tsfill` to generate missing values for all gaps. The `full` option will furthermore generate missing values for the leading and trailing 'gaps', i.e. the time periods before and after the entity was actually being measured. {{{
. webuse invest2
Line 46: Line 63:
`carryforward` can be used to populate the gaps with the most recent non-missing value. . xtset company time

Panel variable: company (strongly balanced)
 Time variable: time, 1 to 20
         Delta: 1 unit
}}}

A common issue is misspecifying the time units (e.g., specifying a wave for panel data that had cohorts added and subtracted across waves).

More generally, unbalanced panel datasets are common. This is ''not necessarily'' an issue, depending on the planned analysis.

For models where unbalanced data cannot be used, the 'issue' can be 'corrected' using '''`tsfill`''' to generate missing values for all gaps. The `full` option will furthermore generate missing values for the leading and trailing 'gaps', i.e. the time periods before and after the entity was actually being measured.

'''`carryforward`''' can be used to populate the gaps with the most recent non-missing value.
Line 56: Line 86:
`ipolate` can be used to populate the gaps with interpolated values. '''`ipolate`''' can be used to populate the gaps with interpolated values.

Stata xtset

The xtset command declares a panel dataset.


Usage

General use is:

. webuse nlswork
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtset idcode year

Panel variable: idcode (unbalanced)
 Time variable: year, 68 to 88, but with gaps
         Delta: 1 unit

For datasets such as the demo datasets, the panel variable and time variable are preset so xtset can be called without any arguments.

Variable formats

Date and time formats do influence the output of panel commands. Consider:

. webuse patienttimes

. xtset pid tod

Panel variable: pid (unbalanced)
 Time variable: tod, 1.449e+12 to 1.449e+12, but with gaps
         Delta: 1 unit

. format tod %tc

. xtset pid tod

Panel variable: pid (unbalanced)
 Time variable: tod, 03dec2005 06:30:00 to 03dec2005 18:00:00, but with gaps
         Delta: .001 seconds

Equivalent to using the format command is using the format, i.e. xtset pid tod, format(%tc).

Balanced data

Balanced and unbalanced refers to whether each panel unit has a measurement for each time period. Compare the above unbalanced datasets to:

. webuse invest2

. xtset company time

Panel variable: company (strongly balanced)
 Time variable: time, 1 to 20
         Delta: 1 unit

A common issue is misspecifying the time units (e.g., specifying a wave for panel data that had cohorts added and subtracted across waves).

More generally, unbalanced panel datasets are common. This is not necessarily an issue, depending on the planned analysis.

For models where unbalanced data cannot be used, the 'issue' can be 'corrected' using tsfill to generate missing values for all gaps. The full option will furthermore generate missing values for the leading and trailing 'gaps', i.e. the time periods before and after the entity was actually being measured.

carryforward can be used to populate the gaps with the most recent non-missing value.

webuse nlswork
xtset idcode year
sort id
by id: carryforward birth_yr, generate(birth_yr2)
by id: carryforward birth_yr, replace

ipolate can be used to populate the gaps with interpolated values.


Operators

See tsset for the time series operators.


CategoryRicottone

Stata/XtSet (last edited 2025-06-10 16:27:13 by DominicRicottone)