Stata Data Types
Stata exposes numeric and string data types.
Date and time data are numerics with a special display format.
Mata has its own data types. See here for more details.
Contents
Bytes
A byte is the smallest numeric type exposed in Stata's syntax. An excellent way to save on file size and computational overhead is to generate categorical variables as bytes.
Integers
An integer is two bytes.
Long Numbers
A long number is four bytes.
Floating Point Numbers
Floating point numbers are four bytes with a variably-placed decimal point. Henceforward they are referred to as floats.
Range
Stata stores three scalars that describe the system's limits for floats.
c(minfloat) is the largest negative number that can be stored as a float
c(maxfloat) is the largest positive number
c(epsfloat) is the smallest nonzero, positive number (epsilon) that, when added to 1 and stored as a float, does not equal 1
Doubles
Doubles are essentially floating point numbers stored in 8 bytes of data.
Range
Stata stores three scalars that describe the system's limits for doubles.
c(mindouble) is the largest negative number that can be stored as a double
c(maxdouble) is the largest positive number
c(epsdouble) is the smallest nonzero, positive number (epsilon) that, when added to 1 and stored as a double, does not equal 1
There is also c(smallestdouble), which is the smallest full-precision double that is bigger than zero.
Strings
Strings store encoded text. The data type will automatically grow as needed when mutated.
Starting with Stata 14, strings can be up to 2045 bytes long. Previously the limit was 244.
Note that 2045 bytes does not mean 2045 characters if using a multi-byte encoding, such as Unicode. (Support for Unicode was also introduced in Stata 14.)
Long Strings
Long strings store strings, whether encoded or binary, up to 2,000,000,000 bytes long. This data type was added in Stata 13 and has absolutely no compatibility with earlier versions.
Dates and Datetimes
Dates and datetimes are numbers with special data formats over numbers. There are a variety of formats counting in different units, but generally they count from January 1, 1960.
There are no date or datetime literals, but the date and datetime pseudo-functions operate much like a data literal.