Size: 3120
Comment:
|
← Revision 13 as of 2025-03-05 02:10:25 ⇥
Size: 6984
Comment: Rewrite
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Data Formats = | = Stata Data Formats = The Stata data file format encodes variable metadata including '''display formats'''. While these formats primarily affect visualization, they can also encode critical information about how a variable should be used. See also the underlying [[Stata/DataTypes|data types]]. |
Line 11: | Line 15: |
=== Default Formats === | A display format is set like: |
Line 13: | Line 17: |
The default display format for each numeric data type is as follows: | {{{ generate double my_datetime = clock(some_string, "YMDhms") format my_datetime %tc }}} ---- === Numeric Formats === For the most part, all numeric types operate the same way in Stata. See [[Stata/NumericFunctions|Numeric Functions]] for operating on this type of data. Each underlying numeric data format applies a different display format by default. They are: |
Line 22: | Line 41: |
The available numeric formats are: | |
Line 23: | Line 43: |
* '''general format''' (`g`) indicating that the number of decimal places should be shifted to improve readability * '''fixed width format''' (`f`) indicating that a fixed number of decimal places should be shown * '''scientific format''' (`e`) indicating that scientific notation should be used |
|
Line 24: | Line 47: |
=== Numeric Formats === The numeric formats are '''`e`''', '''`f`''', and '''`g`'''. The '''general format''' (`g`) indicates that the number of decimal places should be shifted to improve readability. The '''fixed width format''' (`f`) indicates that a fixed number of decimal places should be shown. The '''scientific format''' (`e`) indicates that scientific notation should be used. |
As an example: |
Line 32: | Line 53: |
A `c` can be appended to any numeric format to indicate that commas should be shown. | A `c` can be appended to any numeric format to display commas. ---- === Date and Datetime Formats === Dates and datetimes are numeric data with a standardized, encoded meaning. The display format is what indicates the intended encoding. For the most part, this type of data counts days or milliseconds from the Stata epoch: `01jan1960 00:00:00.000`. See [[Stata/DatetimeFunctions|Datetime Functions]] for operating on this type of data. Specifically, the date and datetime formats are: ||'''Format''' ||'''Unit''' || ||`%tc` ||milliseconds '''ignoring''' leap seconds || ||`%tC` ||milliseconds '''with''' leap seconds || ||`%td` ||days || ||`%tw` ||weeks || ||`%tm` ||months || ||`%tq` ||quarters || ||`%th` ||half-years || ||`%ty` ||years || These formats can be further customized, for visualization purposes only, with specific components. ||'''Component''' ||'''Specification'''||'''Displays As''' || ||Century ||`CC` ||`01`-`99` || ||Century ||`cc` ||`1`-`99` || ||Year ||`YY` ||`01`-`99` || ||Year ||`yy` ||`1`-`99` || ||Day of year ||`JJJ` ||`001`-`366` || ||Day of year ||`jjj` ||`1`-`366` || ||Month ||`Mon` ||`Jan`-`Dec` || ||Month ||`Month` ||`January`-`December` || ||Month ||`mon` ||`jan`-`dec` || ||Month ||`month` ||`january`-`december` || ||Month ||`NN` ||`01`-`12` || ||Month ||`nn` ||`1`-`12` || ||Day ||`DD` ||`01`-`31` || ||Day ||`dd` ||`1`-`31` || ||Day of week ||`DAYNAME` ||`Sunday`-`Saturday` (aligned) || ||Day of week ||`Dayname` ||`Sunday`-`Saturday` (unaligned)|| ||Day of week ||`Day` ||`Sun`-`Sat` || ||Day of week ||`Da` ||`Su`-`Sa` || ||Day of week ||`day` ||`sun`-`sat` || ||Day of week ||`da` ||`su`-`sa` || ||Half-year ||`h` ||`1` or `2` || ||Quarter ||`q` ||`1`-`4` || ||Week ||`WW` ||`01`-`52` || ||Week ||`ww` ||`1`-`52` || ||Hour ||`HH` ||`00`-`23` || ||Hour ||`Hh` ||`00`-`12` || ||Hour ||`hH` ||`0`-`23` || ||Hour ||`hh` ||`0`-`12` || ||Minute ||`MM` ||`00`-`59` || ||Minute ||`mm` ||`0`-`59` || ||Second ||`SS` ||`00`-`60` (due to leap second) || ||Second ||`ss` ||`0`-`60` (due to leap second) || ||Tenths ||`.s` ||`.0`-`.9` || ||Hundredths ||`.ss` ||`.00`-`.99` || ||Thousandths ||`.sss` ||`.000`-`.999` || ||AM/PM ||`am`/`pm` ||`am` or `pm` || ||AM/PM ||`a.m.`/`p.m.` ||`a.m.` or `p.m.` || ||AM/PM ||`AM`/`PM` ||`AM` or `PM` || ||AM/PM ||`A.M.`/`P.M.` ||`A.M.` or `P.M.` || ||Period ||`.` ||`.` || ||Comma ||`,` ||`,` || ||Colon ||`:` ||`:` || ||Hyphen ||`-` ||`-` || ||Space ||` ` ||` ` || ||Forward slash ||`/` ||`/` || ||Back slash ||`\` ||`\` || ||Literal character||`!c` ||`c` || A plus sign (`+`) can optionally delimit components for human readability. It is ignored otherwise. ---- |
Line 39: | Line 138: |
Line 44: | Line 142: |
== list == | == See also == |
Line 46: | Line 144: |
The '''`list`''' command examines data to (re-)allocate text width. If the longest value for a string variable with format `%18s` is 12 characters long, then `list` will only allocate 12 columns for that variable. This behavior can be disabled using the '''`nocompress`''' option. Note that the default behavior has an impact on performance, especially for large datasets. As such, there is a '''`fast`''' option which is simply an alias for `nocompress`. To truncate string values specifically, use the '''`string`''' option. {{{ list comment, string(10) }}} === String Value Alignment === The `list` command automatically shifts between two output modes based on the width of the listed variables and the width of the screen. In '''table format''', the `list` command right-justifies all string values. In '''display format''', string values are aligned according to the display format. A string value would be left-justified if the variable had a format of `%-18s`. === Variable Names === The `list` command also abbreviates variable names (defaulting to 8 characters). To increase that character limit, use the '''`abbreviate`''' option. {{{ list very_long_variable_name, abbreviate(50) }}} === Value Labels === The `list` command also uses labels (as opposed to values) when available. To override this behavior, use the '''`nolabel`''' option. Value labels are aligned in the same way as string values; based on the output mode and the display format. Just as a string value would be left-justified if the variable had a format of `%-18s`, a label would be justified if the variable had a format of `%-8g`. |
[[https://www.stata.com/manuals/ddatetime.pdf|Stata datetimes]] |
Stata Data Formats
The Stata data file format encodes variable metadata including display formats. While these formats primarily affect visualization, they can also encode critical information about how a variable should be used.
See also the underlying data types.
Contents
Display Formats
A display format is set like:
generate double my_datetime = clock(some_string, "YMDhms") format my_datetime %tc
Numeric Formats
For the most part, all numeric types operate the same way in Stata.
See Numeric Functions for operating on this type of data.
Each underlying numeric data format applies a different display format by default. They are:
Type |
Format |
double |
%10.0g |
float |
%9.0g |
long |
%12.0g |
int |
%8.0g |
byte |
%8.0g |
The available numeric formats are:
general format (g) indicating that the number of decimal places should be shifted to improve readability
fixed width format (f) indicating that a fixed number of decimal places should be shown
scientific format (e) indicating that scientific notation should be used
As an example:
Value |
With format %9.4g |
With format %9.4f |
With format %9.2e |
3.14159 |
3.142 |
3.14 |
3.14e+00 |
314.159 |
314.2 |
314.16 |
3.14e+02 |
A c can be appended to any numeric format to display commas.
Date and Datetime Formats
Dates and datetimes are numeric data with a standardized, encoded meaning. The display format is what indicates the intended encoding. For the most part, this type of data counts days or milliseconds from the Stata epoch: 01jan1960 00:00:00.000.
See Datetime Functions for operating on this type of data.
Specifically, the date and datetime formats are:
Format |
Unit |
%tc |
milliseconds ignoring leap seconds |
%tC |
milliseconds with leap seconds |
%td |
days |
%tw |
weeks |
%tm |
months |
%tq |
quarters |
%th |
half-years |
%ty |
years |
These formats can be further customized, for visualization purposes only, with specific components.
Component |
Specification |
Displays As |
Century |
CC |
01-99 |
Century |
cc |
1-99 |
Year |
YY |
01-99 |
Year |
yy |
1-99 |
Day of year |
JJJ |
001-366 |
Day of year |
jjj |
1-366 |
Month |
Mon |
Jan-Dec |
Month |
Month |
January-December |
Month |
mon |
jan-dec |
Month |
month |
january-december |
Month |
NN |
01-12 |
Month |
nn |
1-12 |
Day |
DD |
01-31 |
Day |
dd |
1-31 |
Day of week |
DAYNAME |
Sunday-Saturday (aligned) |
Day of week |
Dayname |
Sunday-Saturday (unaligned) |
Day of week |
Day |
Sun-Sat |
Day of week |
Da |
Su-Sa |
Day of week |
day |
sun-sat |
Day of week |
da |
su-sa |
Half-year |
h |
1 or 2 |
Quarter |
q |
1-4 |
Week |
WW |
01-52 |
Week |
ww |
1-52 |
Hour |
HH |
00-23 |
Hour |
Hh |
00-12 |
Hour |
hH |
0-23 |
Hour |
hh |
0-12 |
Minute |
MM |
00-59 |
Minute |
mm |
0-59 |
Second |
SS |
00-60 (due to leap second) |
Second |
ss |
0-60 (due to leap second) |
Tenths |
.s |
.0-.9 |
Hundredths |
.ss |
.00-.99 |
Thousandths |
.sss |
.000-.999 |
AM/PM |
am/pm |
am or pm |
AM/PM |
a.m./p.m. |
a.m. or p.m. |
AM/PM |
AM/PM |
AM or PM |
AM/PM |
A.M./P.M. |
A.M. or P.M. |
Period |
. |
. |
Comma |
, |
, |
Colon |
: |
: |
Hyphen |
- |
- |
Space |
|
|
Forward slash |
/ |
/ |
Back slash |
\ |
\ |
Literal character |
!c |
c |
A plus sign (+) can optionally delimit components for human readability. It is ignored otherwise.
String Formats
Alignment is controlled by the presence or absence of a negative sign (-) ahead of the width. A string variable formatted as %-18s will be left-justified; with a format of %18s it would have been right-justified.