Skip to content

AST Method Matrix

This page documents how the typedframes static checker handles each DataFrame operation. Operations fall into three categories: schema-modifying (the checker updates its internal column model), row-passthrough (the checker assumes the schema is unchanged), and untracked (the variable is dropped from tracking to avoid false positives).


Schema-Modifying Operations

The checker updates its column model when it sees these operations, so subsequent accesses are validated against the new schema.

Operation Effect on schema Example
df["col"] = val Adds "col" to the schema df["score"] = df["value"] * 2
del df["col"] Removes "col" from the schema del df["temp"]
df.drop(columns=[…]) Removes listed columns df.drop(columns=["a", "b"])
df.drop([…]) Removes listed columns (positional) df.drop(["a", "b"])
df.assign(col=…) Adds new column(s) to the schema df.assign(full_name=…)
df.rename(columns={…}) Renames columns in the schema df.rename(columns={"a": "b"})
df.select([…]) Narrows schema to selected columns df.select(["id", "name"])
df.select(pl.col("…")) Narrows schema to the named column df.select(pl.col("id"))
df.pop("col") Removes "col" from the schema df.pop("score")
df.insert(pos, "col", val) Adds "col" to the schema df.insert(0, "rank", …)
df[["c1", "c2"]] Narrows schema to selected columns subset = df[["id", "name"]]
pd.merge(left, right, …) Merges both schemas merged = pd.merge(a, b, on="id")
pd.concat([df1, df2], …) Unions both schemas combined = pd.concat([a, b])

Row-Passthrough Operations

The checker leaves the schema unchanged for these operations — the output variable inherits the same column model as the input.

Operation Notes
df.filter(…) Row filter; columns unchanged
df.query(…) pandas query string; columns unchanged
df.head(n) First n rows; columns unchanged
df.tail(n) Last n rows; columns unchanged
df.sample(…) Random sample; columns unchanged
df.sort_values(…) Row sort; columns unchanged
df.sort(…) polars row sort; columns unchanged
df.reset_index(…) Index reset; columns unchanged
df.nlargest(n, col) Top n rows; columns unchanged
df.nsmallest(n, col) Bottom n rows; columns unchanged
df.fillna(…) Fill NaN values; columns unchanged
df.dropna(…) Drop NaN rows; columns unchanged
df.ffill() / df.bfill() Forward/back fill; columns unchanged

Untracked Operations

For these operations the result variable is not tracked — the checker won't report false positives on it, but it also won't validate column references against it.

These operations require runtime information (joined keys, pivot categories, melt id-vars, explosion depth, etc.) that is not available to a static AST pass. Tracking them correctly would require evaluating expressions at compile time, which is out of scope for a static checker.

Operation Why untracked
df.join(other, …) Output schema depends on join keys and how= parameter
df.merge(other, …) (pandas instance method) Same as join
df.pivot(…) Output columns are derived from cell values at runtime
df.pivot_table(…) Same as pivot
df.melt(…) Converts columns to rows; output schema varies by id_vars
df.explode(col) Schema depends on list column depth
pd.get_dummies(df, …) Columns come from categorical values, unknown at lint time
df.stack(…) Pivots column level to row index
df.unstack(…) Pivots row index to column level
df.apply(fn, …) Output depends on the return type of fn
df.map(fn, …) Output depends on fn
df.transform(fn, …) Output depends on fn
df.groupby(…).agg(…) Output columns are determined by aggregation spec
df.with_columns(…) polars column addition/mutation; schema not narrowed statically

Error Code Reference

Code Severity Message Default
unknown-column Error Column '<name>' not found in <Schema>. Did you mean '<suggestion>'? Always reported
reserved-name Error Renamed-from column '<name>' not found in <Schema> Always reported
untracked-dataframe Warning Columns unknown at lint time — annotate with a schema to enable column checking Off by default
dropped-unknown-column Warning Dropped column '<name>' does not exist in <Schema> Off by default

untracked-dataframe is suppressed unless --strict-ingest is passed to the CLI. This keeps the checker quiet on exploratory scripts that load data without a schema annotation.

unknown-column reports the closest column name as a typo suggestion when the edit distance is small (≤ 2 characters), which helps catch common capitalization and spelling mistakes.