Readers & Writers
Readers and writers are the I/O layer of CESM. A reader loads data from an external format into a common in-memory representation (a dictionary of pandas DataFrames), and a writer serialises that representation back to an external format. This page describes the conventions that all readers and writers follow, the DataFrame structures they operate on, and the specifics of each supported format.
See also: Architecture | Adding a New Format
Design overview
Abstract base classes
The module src/core/interfaces.py defines two abstract base classes:
class DataReader(ABC):
@abstractmethod
def read(self, source: Any) -> Database: ...
class DataWriter(ABC):
@abstractmethod
def write(self, data: Database, target: Any) -> None: ...
These classes establish the contract that every reader returns a Database object and every writer accepts one.
In practice, the current implementations mostly use standalone functions that operate on Dict[str, pd.DataFrame] directly, because the DataFrame dictionary is the central interchange format within the pipeline.
Typical function signatures
Readers expose a function that returns the dictionary:
def dataframes_from_duckdb(
db_path: str,
tables: Optional[list] = None,
) -> Dict[str, pd.DataFrame]: ...
def spine_to_dataframes(
db_url: str,
scenario: str,
) -> Dict[str, pd.DataFrame]: ...
Writers expose a function that accepts the dictionary:
def dataframes_to_duckdb(
dataframes: Dict[str, pd.DataFrame],
db_path: str,
overwrite: bool = True,
) -> None: ...
def dataframes_to_spine(
dataframes: Dict[str, pd.DataFrame],
db_url: str,
import_datetime: Optional[str] = None,
purge_before_import: bool = True,
) -> None: ...
The YAML reader follows a two-step pattern: first load the YAML into a LinkML-generated Dataset object, then convert that object to DataFrames with yaml_to_df().
DataFrame conventions
The Dict[str, pd.DataFrame] returned by readers (and consumed by writers) uses a consistent set of naming and structural conventions.
Dictionary keys
Each key in the dictionary identifies an entity class or a typed parameter group.
- Entity DataFrames
-
Plain class names such as
node,unit,connection, or dot-separated names for multi-dimensional classes such asunit.outputNode. - Time-series DataFrames
-
<class>.ts.<parameter>— for example,node.ts.inflow. - String / categorical (Map) DataFrames
-
<class>.str.<parameter>— for example,node.str.inflow. - Array DataFrames
-
<class>.array.<parameter>— for example,unit.array.capacity_profile.
Entity DataFrames
Entity DataFrames hold scalar parameter values. Rows represent entities; columns represent parameters.
# Single-dimensional entity class
dataframes['node'] = pd.DataFrame({
'annual_flow': [100000.0, 80000.0, None],
'penalty_up': [10000.0, 10000.0, 10000.0],
}, index=pd.Index(['west', 'east', 'heat'], name='node'))
For multi-dimensional entity classes the index is a pd.MultiIndex:
# Multi-dimensional entity class
dataframes['unit.outputNode'] = pd.DataFrame({
'capacity': [100.0, 50.0],
'efficiency': [0.9, 0.95],
}, index=pd.MultiIndex.from_tuples(
[('coal_plant', 'west'), ('gas_plant', 'east')],
names=['unit', 'outputNode'],
))
Time-series and indexed DataFrames
Time-series, map, and array DataFrames share a transposed layout: the index carries the time or category dimension, and columns carry entity names.
# Time-series example: node.ts.inflow
dataframes['node.ts.inflow'] = pd.DataFrame({
'west': [-1002.1, -980.7, -968.0],
'east': [-1002.1, -980.7, -968.0],
'heat': [-30.0, -40.0, -50.0],
}, index=pd.date_range('2023-01-01', periods=3, freq='h', name='datetime'))
For multi-dimensional entity classes the columns become a pd.MultiIndex with levels [name, dimension1, dimension2, …].
Column naming for MultiIndex columns
When a DataFrame has MultiIndex columns, the levels are encoded with a :: separator for storage in flat formats such as DuckDB:
("region", "north", "heat") -> "region::north::heat"
The :: separator is used instead of . because entity names may already contain dots.
Existing readers
Readers live under src/readers/.
| Module | Entry point | Notes |
|---|---|---|
|
CLI script; calls |
Loads a CESM YAML file via the LinkML runtime, converts the |
|
|
Class-based reader that wraps |
|
|
Reads DuckDB tables and uses a |
|
|
Reads a Spine Toolbox database filtered by scenario. Separates scalar, time-series, map, and array parameters into appropriately keyed DataFrames. |
Existing writers
Writers live under src/writers/.
| Module | Entry point | Notes |
|---|---|---|
|
|
Flattens DataFrames (resets indexes, encodes MultiIndex columns with |
|
|
Writes to a Spine Toolbox SQLite database. Creates entity classes, entities, and parameter values. Supports scalars, time series ( |
Format-specific details
DuckDB
The DuckDB format provides a one-to-one mapping between DataFrames and database tables.
- Metadata table (
_dataframe_metadata) -
Stores the structural information needed to reconstruct DataFrames on read:
Column Description table_nameOriginal dictionary key (e.g.,
node.ts.inflow)sql_table_nameSQL-safe table name (dots and hyphens replaced with underscores)
index_typeOne of
single,datetime, ormultiindex_countNumber of leading columns that form the index
columns_multiindexBoolean — whether columns are MultiIndex
columns_levelsJSON array of level names (if MultiIndex columns)
- Datetime handling
-
All
DatetimeIndexvalues are normalised to UTC (timezone-naive) before storage. On read, the firstindex_countcolumns are converted back to the appropriate index type.
Spine Toolbox
The Spine DB writer maps CESM concepts to Spine Toolbox concepts as follows:
| CESM concept | Spine DB concept |
|---|---|
Entity class (single-dimensional) |
Object class |
Entity class (multi-dimensional, e.g., |
Relationship class ( |
Entity name |
Object / relationship entity |
Scalar parameter |
Parameter value (float, string) |
Time-series parameter |
Parameter value of type |
Map parameter |
Parameter value of type |
Array parameter |
Parameter value of type |
- Class name conversion
-
Dots in CESM class names are replaced with double underscores for Spine DB (e.g.,
unit.nodebecomesunit__node). - Alternatives and scenarios
-
Each import creates a timestamped alternative (e.g.,
cesm-2025_10_02-15_30) and attaches it to abasescenario. By default the writer purges existing parameter values, entities, and alternatives before importing. - Dependencies
-
The Spine DB reader and writer require the
spinedb_apipackage.
Data flow summary
The following diagram shows the typical data flow through the reader/writer layer:
External format In-memory representation External format
(YAML / DuckDB / --> Dict[str, pd.DataFrame] --> (DuckDB / Spine DB
Spine DB) / other)
|
v
Transformation layer
(filtering, mapping,
unit conversion, ...)
Readers populate the dictionary. The transformation layer may modify it (see Transformer Developer Guide). Writers consume the dictionary and persist it in the target format.