Stata Python

Stata supports calling out to an embedded Python interpretter.


Installation

Most system configuration is done with the python set command.

Stata can list recognized Python environments with python search. To add an unrecognized environment, try:

python set exec "C:\path\to\python\installation"

To prepend or append to the PYTHONPATH, use:

python set userpath "C:\foo" "C:\bar" "C:\baz", [prepend]

To make these settings permanent, add the permanent option.


Usage

Interactive Prompt

Within a Stata interactive session, enter a Python interactive subsession with the python command. For example:

Stata local variables are accessed with quotations.

. local int_var = 3
. local str_var = "This is a Stata string"
. python
---------------------------------------- python (type end to exit) -----------
>>> `int_var'
3
>>> "`str_var'".split(" ")
['This', 'is', 'a', 'Stata', 'string']

A Stata command can be used by prefixing with stata:.

>>> stata: webuse auto, clear

Scope

To interactively run a single Python command and immediately return to the Stata session, use the python: command instead. Note the colon (:).

python: print("Hello, world")

Programs

Use Python within an ado file with the python: command. Much like a Stata program or an interactive Python session, all lines between python: and end will be interpretted by the Python subsession. For example:

python:

import sqlite3
import pandas as pd

con = sqlite3.connect("example.db")
df = pd.read_sql_query("SELECT * from example", con)
con.close()

end

Note that objects in the __main__ namespace are retained across Python sessions. If the con sqlite3.Connection object was not closed, it would have remained in memory until the Stata process ended.

Interface Module

To move data between Python and Stata processes, use the sfi module.

python:

import pandas as pd
from sfi import Data

# initialize N cases
Data.setObsTotal(len(df))

# initialize variables
Data.addVarDouble("id")
Data.addVarStr("name",5)

# copy columnar data
Data.store("id", None, df["id"], None)
Data.store("zipcode", None, df["name"], None)

# free memory
del df

end

This module can be imported into both programs and interactive sessions. It is not a publicly available module.

Mixing Python and Stata Programs

When designing a generalized Python program for use from within Stata, the predominant design pattern is:

program varsum
    version 16.0
    syntax varname [if] [in]

    python: _varsum("`varlist'", "`touse'")
    display as txt " sum of ‘varlist’: " as res r(sum)
end

python:
from sfi import Data, Scalar
def _varsum(varname, touse):
    x = Data.get(varname, None, touse)
    Scalar.setValue("r(sum)", sum(x))
end

. webuse auto
(1978 Automobile Data)

. varsum price
sum of price: 456229

. varsum price if foreign
sum of price: 140463


See also

Stata manual on PyStata integration


CategoryRicottone

Stata/Python (last edited 2023-09-26 18:26:25 by DominicRicottone)