Differences between revisions 10 and 13 (spanning 3 versions)
Revision 10 as of 2023-06-07 18:39:14
Size: 2622
Comment:
Revision 13 as of 2025-03-05 02:10:25
Size: 6984
Comment: Rewrite
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
In the topic of data formats in Stata, the concept of '''display formats''' is of primary interest. Stata uses display formats to extend the [[Stata/DataTypes|type system]]. The Stata data file format encodes variable metadata including '''display formats'''. While these formats primarily affect visualization, they can also encode critical information about how a variable should be used.
Line 5: Line 5:
Several commands have internal ideas about data formats. See also the underlying [[Stata/DataTypes|data types]].
Line 15: Line 15:
A display format is set like:

{{{
generate double my_datetime = clock(some_string, "YMDhms")
format my_datetime %tc
}}}

----
Line 17: Line 25:
=== Numeric Data ===
Line 19: Line 26:
The ''default'' display format for each numeric data type is as follows: === Numeric Formats ===

For the most part, all numeric types operate the same way in Stata.

See [[Stata/NumericFunctions|Numeric Functions]] for operating on this type of data.

Each underlying numeric data format applies a different display format by default. They are:
Line 28: Line 41:
The ''available'' numeric formats are '''`e`''', '''`f`''', and '''`g`'''. The '''general format''' (`g`) indicates that the number of decimal places should be shifted to improve readability. The '''fixed width format''' (`f`) indicates that a fixed number of decimal places should be shown. The '''scientific format''' (`e`) indicates that scientific notation should be used. The available numeric formats are:

 *
'''general format''' (`g`) indicating that the number of decimal places should be shifted to improve readability
 *
'''fixed width format''' (`f`) indicating that a fixed number of decimal places should be shown
 *
'''scientific format''' (`e`) indicating that scientific notation should be used

As an example:
Line 34: Line 53:
A `c` can be appended to any numeric format to indicate that commas should be shown. A `c` can be appended to any numeric format to display commas.

----
Line 38: Line 59:
=== Date and Datetime Data === === Date and Datetime Formats ===
Line 40: Line 61:
Dates and datetimes are numeric data with unit-specific display formats. These are: Dates and datetimes are numeric data with a standardized, encoded meaning. The display format is what indicates the intended encoding. For the most part, this type of data counts days or milliseconds from the Stata epoch: `01jan1960 00:00:00.000`.

See [[Stata/DatetimeFunctions|Datetime Functions]] for operating on this type of data.

Specifically, the date and datetime formats are:
Line 52: Line 77:
These formats can be further customized, for visualization purposes only, with specific components.

||'''Component''' ||'''Specification'''||'''Displays As''' ||
||Century ||`CC` ||`01`-`99` ||
||Century ||`cc` ||`1`-`99` ||
||Year ||`YY` ||`01`-`99` ||
||Year ||`yy` ||`1`-`99` ||
||Day of year ||`JJJ` ||`001`-`366` ||
||Day of year ||`jjj` ||`1`-`366` ||
||Month ||`Mon` ||`Jan`-`Dec` ||
||Month ||`Month` ||`January`-`December` ||
||Month ||`mon` ||`jan`-`dec` ||
||Month ||`month` ||`january`-`december` ||
||Month ||`NN` ||`01`-`12` ||
||Month ||`nn` ||`1`-`12` ||
||Day ||`DD` ||`01`-`31` ||
||Day ||`dd` ||`1`-`31` ||
||Day of week ||`DAYNAME` ||`Sunday`-`Saturday` (aligned) ||
||Day of week ||`Dayname` ||`Sunday`-`Saturday` (unaligned)||
||Day of week ||`Day` ||`Sun`-`Sat` ||
||Day of week ||`Da` ||`Su`-`Sa` ||
||Day of week ||`day` ||`sun`-`sat` ||
||Day of week ||`da` ||`su`-`sa` ||
||Half-year ||`h` ||`1` or `2` ||
||Quarter ||`q` ||`1`-`4` ||
||Week ||`WW` ||`01`-`52` ||
||Week ||`ww` ||`1`-`52` ||
||Hour ||`HH` ||`00`-`23` ||
||Hour ||`Hh` ||`00`-`12` ||
||Hour ||`hH` ||`0`-`23` ||
||Hour ||`hh` ||`0`-`12` ||
||Minute ||`MM` ||`00`-`59` ||
||Minute ||`mm` ||`0`-`59` ||
||Second ||`SS` ||`00`-`60` (due to leap second) ||
||Second ||`ss` ||`0`-`60` (due to leap second) ||
||Tenths ||`.s` ||`.0`-`.9` ||
||Hundredths ||`.ss` ||`.00`-`.99` ||
||Thousandths ||`.sss` ||`.000`-`.999` ||
||AM/PM ||`am`/`pm` ||`am` or `pm` ||
||AM/PM ||`a.m.`/`p.m.` ||`a.m.` or `p.m.` ||
||AM/PM ||`AM`/`PM` ||`AM` or `PM` ||
||AM/PM ||`A.M.`/`P.M.` ||`A.M.` or `P.M.` ||
||Period ||`.` ||`.` ||
||Comma ||`,` ||`,` ||
||Colon ||`:` ||`:` ||
||Hyphen ||`-` ||`-` ||
||Space ||` ` ||` ` ||
||Forward slash ||`/` ||`/` ||
||Back slash ||`\` ||`\` ||
||Literal character||`!c` ||`c` ||

A plus sign (`+`) can optionally delimit components for human readability. It is ignored otherwise.

----
Line 54: Line 133:
=== String Data ===
=== String Formats ===
Line 62: Line 142:
== List == == See also ==
Line 64: Line 144:
The '''`list`''' command resizes and reformats output to try and maximize accessibility. This involves automated truncation and alignment overriding display formats. For information on formatting the output of `lists`, see [[Stata/List#Data_Formats|here]]. [[https://www.stata.com/manuals/ddatetime.pdf|Stata datetimes]]

Stata Data Formats

The Stata data file format encodes variable metadata including display formats. While these formats primarily affect visualization, they can also encode critical information about how a variable should be used.

See also the underlying data types.


Display Formats

A display format is set like:

generate double my_datetime = clock(some_string, "YMDhms")
format my_datetime %tc


Numeric Formats

For the most part, all numeric types operate the same way in Stata.

See Numeric Functions for operating on this type of data.

Each underlying numeric data format applies a different display format by default. They are:

Type

Format

double

%10.0g

float

%9.0g

long

%12.0g

int

%8.0g

byte

%8.0g

The available numeric formats are:

  • general format (g) indicating that the number of decimal places should be shifted to improve readability

  • fixed width format (f) indicating that a fixed number of decimal places should be shown

  • scientific format (e) indicating that scientific notation should be used

As an example:

Value

With format %9.4g

With format %9.4f

With format %9.2e

3.14159

3.142

3.14

3.14e+00

314.159

314.2

314.16

3.14e+02

A c can be appended to any numeric format to display commas.


Date and Datetime Formats

Dates and datetimes are numeric data with a standardized, encoded meaning. The display format is what indicates the intended encoding. For the most part, this type of data counts days or milliseconds from the Stata epoch: 01jan1960 00:00:00.000.

See Datetime Functions for operating on this type of data.

Specifically, the date and datetime formats are:

Format

Unit

%tc

milliseconds ignoring leap seconds

%tC

milliseconds with leap seconds

%td

days

%tw

weeks

%tm

months

%tq

quarters

%th

half-years

%ty

years

These formats can be further customized, for visualization purposes only, with specific components.

Component

Specification

Displays As

Century

CC

01-99

Century

cc

1-99

Year

YY

01-99

Year

yy

1-99

Day of year

JJJ

001-366

Day of year

jjj

1-366

Month

Mon

Jan-Dec

Month

Month

January-December

Month

mon

jan-dec

Month

month

january-december

Month

NN

01-12

Month

nn

1-12

Day

DD

01-31

Day

dd

1-31

Day of week

DAYNAME

Sunday-Saturday (aligned)

Day of week

Dayname

Sunday-Saturday (unaligned)

Day of week

Day

Sun-Sat

Day of week

Da

Su-Sa

Day of week

day

sun-sat

Day of week

da

su-sa

Half-year

h

1 or 2

Quarter

q

1-4

Week

WW

01-52

Week

ww

1-52

Hour

HH

00-23

Hour

Hh

00-12

Hour

hH

0-23

Hour

hh

0-12

Minute

MM

00-59

Minute

mm

0-59

Second

SS

00-60 (due to leap second)

Second

ss

0-60 (due to leap second)

Tenths

.s

.0-.9

Hundredths

.ss

.00-.99

Thousandths

.sss

.000-.999

AM/PM

am/pm

am or pm

AM/PM

a.m./p.m.

a.m. or p.m.

AM/PM

AM/PM

AM or PM

AM/PM

A.M./P.M.

A.M. or P.M.

Period

.

.

Comma

,

,

Colon

:

:

Hyphen

-

-

Space

 

 

Forward slash

/

/

Back slash

\

\

Literal character

!c

c

A plus sign (+) can optionally delimit components for human readability. It is ignored otherwise.


String Formats

Alignment is controlled by the presence or absence of a negative sign (-) ahead of the width. A string variable formatted as %-18s will be left-justified; with a format of %18s it would have been right-justified.


See also

Stata datetimes


CategoryRicottone

Stata/DataFormats (last edited 2025-03-05 02:10:25 by DominicRicottone)