Adding a New Format
This guide walks through the process of adding support for a new energy system model format to the CESM conversion framework. The goal is to enable bidirectional conversion between your format and CESM (Common Energy System Model), so that data can flow between your tool and any other supported format.
Throughout this guide, we use PLEXOS as a hypothetical case study and reference the existing FlexTool and GridDB implementations as concrete examples to follow.
See also: Architecture, Readers & Writers, Transformer Developer Guide.
Overview of the conversion pipeline
A complete format integration consists of these components:
Source format
-> Reader (src/readers/from_{format}.py)
-> Dict[str, pd.DataFrame]
-> YAML Transformer (entity + parameter mapping)
-> Python Transformer (complex logic)
-> CESM Dict[str, pd.DataFrame]
-> Writer (src/writers/to_{format}.py)
-> Target format
Not every format requires all components. If your source data is already in a Spine database or DuckDB, you can reuse the existing readers. Similarly, if the target is a standard format, you can reuse an existing writer. The transformer layer — both YAML and Python — is always required and is where most of the work happens.
Step 1: Analyze the source format
Before writing any code, study the source format thoroughly. Identify the following:
- Entity types
-
What are the main objects in the model? Generators, nodes, lines, fuels, storage units?
- Parameters
-
What properties does each entity type have? Capacities, costs, efficiencies, limits?
- Relationships
-
How are entities connected? Which generator connects to which node? Which line connects two nodes?
- Temporal structure
-
How is time represented? Hourly time series? Period-indexed data? Named timesteps or datetime indexes?
- Scenarios
-
Does the format support multiple scenarios or alternatives?
PLEXOS case study: format analysis
The file data/PLEXOS-World 2015 Gold V1.1.xml provides an example of a complex relational XML format.
PLEXOS uses a set of relational tables encoded as XML elements:
t_class-
Defines entity types (Generator, Fuel, Storage, Node, Line, Region, Zone, etc.)
t_object-
Instances of classes (individual generators, nodes, etc.)
t_membership-
Relationships between objects (which generator belongs to which node)
t_data-
Parameter values for objects (capacity, heat rate, cost, etc.)
t_attribute-
Attribute definitions with metadata (units, defaults, enumerations)
t_property-
Property definitions that parameters reference
t_collection-
Defines how classes relate to each other (Generator.Nodes, Line.NodeFrom, etc.)
Key entity classes in PLEXOS and their CESM equivalents:
| PLEXOS Class | CESM Entity | Notes |
|---|---|---|
Generator |
unit |
Primary conversion device |
Node |
balance |
Electrical nodes (no storage) |
Line |
link |
Transmission connections between nodes |
Storage |
storage |
Storage devices |
Fuel |
commodity |
Fuel types |
Region / Zone |
group |
Spatial groupings |
Step 2: Create a reader
If your format requires custom reading logic (not a standard Spine database or DuckDB), implement a reader in src/readers/from_{format}.py.
The reader must return a Dict[str, pd.DataFrame] following the standard naming conventions:
- Entity DataFrames
-
Named after the entity class (e.g.,
unit,node,connection). The index contains entity names; columns contain scalar parameter values. - Time series DataFrames
-
Named
{class}.ts.{parameter}(e.g.,unit.ts.availability). The index is aDatetimeIndex; columns are entity names. - Map DataFrames
-
Named
{class}.str.{parameter}(e.g.,solve.str.period_timeset). The index contains string keys; columns are entity names. - Array DataFrames
-
Named
{class}.array.{parameter}(e.g.,solve.array.realized_periods). The index is numeric; columns are entity names. - Multi-dimensional entities
-
For relationship entities (e.g.,
unit.outputNode), use apd.MultiIndexwith levels[name, dimension1, dimension2].
Example: Spine database reader
The existing src/readers/from_spine_db.py demonstrates the pattern.
It reads entity classes, extracts scalar parameters into entity DataFrames, and builds separate DataFrames for time series (ts), maps (str), and arrays (array):
def spine_to_dataframes(db_url: str, scenario: str) -> Dict[str, pd.DataFrame]:
"""
Read Spine database and convert to dataframes.
Returns:
Dict mapping dataframe names to DataFrames:
- Entity dataframes: 'class_name' or 'class1.class2'
- Time series: 'class_name.ts.parameter_name'
- Maps: 'class_name.str.parameter_name'
- Arrays: 'class_name.array.parameter_name'
"""
PLEXOS case study: reader design
A PLEXOS XML reader would:
-
Parse the XML and build lookup tables for
t_class,t_object,t_membership,t_property, andt_collection. -
For each class (Generator, Node, Line, Storage, Fuel), create an entity DataFrame with
t_objectnames as the index. -
Extract
t_dataentries as parameter values. Scalar values go into the entity DataFrame columns; time-indexed data goes into.ts.DataFrames. -
Build multi-dimensional DataFrames for membership relationships (e.g., Generator-to-Node connections).
# src/readers/from_plexos_xml.py
def plexos_xml_to_dataframes(xml_path: str) -> Dict[str, pd.DataFrame]:
"""Read PLEXOS XML and return Dict[str, pd.DataFrame]."""
tree = ET.parse(xml_path)
root = tree.getroot()
# Build lookup tables
classes = {elem.find('class_id').text: elem.find('name').text
for elem in root.findall('t_class')}
objects = parse_objects(root, classes)
memberships = parse_memberships(root, classes, objects)
data = parse_data(root, objects)
# Build DataFrames per class
dataframes = {}
for class_name, entities in objects.items():
dataframes[class_name] = build_entity_df(entities, data)
# Time series go into separate DataFrames
for param, ts_data in extract_timeseries(entities, data).items():
dataframes[f'{class_name}.ts.{param}'] = ts_data
return dataframes
Step 3: Create entity mapping in YAML transformer
The YAML transformer handles the mapping of source entities to CESM entities. Transformer files are stored in:
src/transformers/{format}/cesm_{version}/{source_version}/from_{format}.yaml
For example, FlexTool’s transformer is at:
src/transformers/irena_flextool/cesm_v0.1.0/v3.14.0/from_flextool.yaml
Basic entity mapping
Entity mapping rules use the -entities suffix.
The first item is the source class; the second is the target class:
# Simple 1:1 mapping: FlexTool unit -> CESM unit
unit-entities:
- unit
- unit
# Simple 1:1 mapping: FlexTool commodity -> CESM commodity
commodity-entities:
- commodity
- commodity
Conditional entity mapping
When a single source class maps to different CESM types based on parameter values, use if_parameter and if_not_parameter:
# FlexTool nodes with has_balance but NOT has_storage -> CESM balance
balance-entities:
- node
- balance
- - if_parameter: has_balance
- if_not_parameter: has_storage
# FlexTool nodes with has_storage -> CESM storage
storage-entities:
- node
- storage:
if_parameter: has_storage
This is a common pattern when the source format uses a single entity class (like "node") for what CESM separates into distinct types (balance vs. storage).
Dimensional entity mapping with reordering
Multi-dimensional entities often need dimension reordering.
The order directive specifies which source dimensions map to which target dimensions:
# FlexTool unit.outputNode -> CESM unit_to_node
# Source dimensions: [name, unit, node] at indexes [0, 1, 2]
# Target dimensions: [name, source, sink] - same order
unit_to_node-entities:
- unit.outputNode
- unit_to_node:
order: [[0], [1], [2]]
- dimensions: [source, sink]
# FlexTool unit.inputNode -> CESM node_to_unit
# Source: [name, unit, node] -> Target: [name, source=node, sink=unit]
# Note the swap: dimension 2 (node) becomes source, dimension 1 (unit) becomes sink
node_to_unit-entities:
- unit.inputNode
- node_to_unit:
order: [[0], [2], [1]]
- dimensions: [source, sink]
Link entities with node extraction
Links (transmission lines) often require extracting node endpoints from a higher-dimensional relationship:
# FlexTool connection.node.node -> CESM link
# Source: [name, connection, nodeA, nodeB]
# Target: [name, node_A, node_B] - skip connection dimension
link-entities:
- connection.node.node
- link:
order: [[0], [2], [3]]
- dimensions: [node_A, node_B]
PLEXOS case study: entity mapping
For PLEXOS, the entity mapping would look something like:
# PLEXOS Generator -> CESM unit
unit-entities:
- generator
- unit
# PLEXOS Node -> CESM balance
balance-entities:
- node
- balance
# PLEXOS Line -> CESM link
# Line has NodeFrom and NodeTo memberships
link-entities:
- line
- link
# PLEXOS Storage -> CESM storage
storage-entities:
- storage
- storage
# PLEXOS Fuel -> CESM commodity
commodity-entities:
- fuel
- commodity
Step 4: Create parameter mapping in YAML transformer
Parameter mappings define how source parameters translate to CESM parameters. Each mapping rule has a source (class + parameter), a target (class + parameter), and optional operations.
Basic parameter mapping
# Direct copy: same parameter name, same class
commodity_price:
- commodity: price
- commodity: price_per_unit
Unit conversions
Use operation to apply arithmetic transformations.
This is essential when source and target use different units:
# FlexTool stores discount_rate as 0-1 ratio; CESM uses percentage (0-100)
unit_discount_rate:
- unit: interest_rate
- unit: discount_rate
- operation:
- multiply:
with: 100
# Reverse: CESM percentage -> FlexTool ratio
unit discount_rate:
- unit: discount_rate
- unit: interest_rate
- operation:
- multiply:
with: 0.01
Data type conversions
When a parameter needs to change between scalar and indexed types, specify the types in brackets:
# FlexTool stores inflow as a string-indexed map (str);
# CESM stores flow_profile as a time series (ts)
balance_inflow:
- node: [inflow, [str]]
- balance: [flow_profile, [ts]]
- - if_parameter: has_balance
- if_not_parameter: has_storage
Value renaming
When source and target use different enumeration values for the same concept:
# FlexTool and CESM use different names for conversion methods
conversion_method:
- unit: conversion_method
- unit: conversion_method
- rename:
constant_efficiency: constant_efficiency
min_load_efficiency: two_point_efficiency
# Different solve mode names
solve_mode:
- solve: solve_mode
- solve_pattern: solve_mode
- rename:
single_solve: single_solve
rolling_window: rolling_solve
Dimension reduction with aggregation
When mapping from a higher-dimensional entity to a lower one, use order with aggregate:
# CESM unit_to_node capacity -> FlexTool unit virtual_unitsize
# Reduces from (unit, node) to just (unit), taking the max across nodes
units_capacity:
- unit_to_node: capacity
- unit: virtual_unitsize
- - order: [[1]]
aggregate: max
Supported aggregation methods: sum, average, max, min, first.
Setting constant values
Use value to set a constant parameter value during entity creation:
# When creating FlexTool nodes from CESM balance entities,
# set has_balance = 'yes'
balance-has_balance:
- balance
- node: has_balance
- value: 'yes'
PLEXOS case study: parameter mapping challenges
PLEXOS has over 3000 property definitions across its classes, while CESM uses a method-based approach with far fewer parameters. Key challenges:
-
Attribute explosion: Many PLEXOS properties have no direct CESM equivalent. Focus on the properties that affect dispatch and investment decisions: capacity, cost, efficiency, availability, constraints.
-
Property-band structure: PLEXOS uses a band system where a single property can have multiple bands (e.g., heat rate at different operating points). CESM uses multiarray DataFrames for this purpose.
-
Missing temporal structure: PLEXOS encodes temporal resolution through Data File references and pattern objects, which need to be reconstructed into CESM’s explicit timeline.
Step 5: Create Python transformer for complex logic
Some transformations cannot be expressed in YAML configuration alone. These require a Python transformer module at:
src/transformers/{format}/cesm_{version}/{source_version}/to_cesm.py
The module must implement a transform_to_cesm function that receives both the source and partially-transformed CESM DataFrames:
def transform_to_cesm(
source: Dict[str, pd.DataFrame],
cesm: Dict[str, pd.DataFrame],
start_time: datetime
) -> Dict[str, pd.DataFrame]:
"""
Apply Python-based transformations after YAML transformer.
Args:
source: Original source DataFrames (read-only reference)
cesm: Partially transformed CESM DataFrames (from YAML stage)
start_time: Start datetime for timeline reconstruction
Returns:
Updated cesm dictionary
"""
When to use Python transformers
Use Python when the transformation involves:
- Timeline reconstruction
-
Converting between different temporal representations. FlexTool uses string timestep indexes (
t0001,t0002) that must be converted to datetime. Thetime_from_spinefunction in the FlexTool transformer handles this. - Cross-entity lookups
-
When a target parameter depends on data from multiple source entities. For example, FlexTool’s capacity transformation (
capacities_from_spine) looks upunit.outputNoderelationships to determine howvirtual_unitsizemaps tounit_to_node.capacity. - Multi-source calculations
-
When a CESM parameter is computed from multiple source parameters. For example, efficiency transformation (
efficiency_to_cesm) combinesefficiency,min_load, andefficiency_at_min_loadinto CESM’sconversion_ratesformat. - Configuration-driven logic
-
The GridDB transformer uses a YAML configuration file (
to_cesm_config.yaml) for technology-specific mappings like heat rate conversion factors and prime-mover-to-fuel lookups.
Example: FlexTool Python transformer
The FlexTool Python transformer at src/transformers/irena_flextool/cesm_v0.1.0/v3.14.0/to_cesm.py handles four categories of complex logic:
-
time_from_spine— Reconstructs CESM timeline, solve patterns, and periods from FlexTool’s timeset/timeline structure. -
capacities_from_spine— Maps unit capacity and cost data to the correct port entities, handling the virtual_unitsize vs. existing distinction. -
profile_to_cesm— Transforms profile limit data by looking up profile connections across unit-node-profile relationships. -
efficiency_to_cesm— Converts efficiency ratios to CESM percentage format, with optional two-point efficiency curves as multiarray DataFrames.
Example: GridDB Python transformer
The GridDB transformer at src/transformers/griddb/cesm_v0.1.0/v0.2.0/to_cesm.py is entirely Python-based (no YAML transformer) because GridDB’s relational SQL structure requires complex joins and lookups.
It uses a configuration file (to_cesm_config.yaml) for:
-
Heat rate to efficiency conversion factors
-
Default efficiencies per technology type
-
Prime mover and fuel type mappings
Step 6: Create reverse transformer (CESM to source format)
For bidirectional conversion, create a reverse transformer. The reverse YAML file mirrors the forward file with source and target swapped:
src/transformers/{format}/cesm_{version}/{source_version}/to_{format}.yaml
Key differences in the reverse direction:
-
Source and target are swapped in every rule.
-
Rename mappings are inverted (source_value and target_value swap).
-
Multiply operations become divide (or use the reciprocal factor).
-
Dimension reordering indexes are adjusted to reverse the mapping.
Example: FlexTool forward vs. reverse
Forward (from_flextool.yaml):
inflation_rate:
- model: discount_rate
- system: inflation_rate
- operation:
- multiply:
with: 100
Reverse (to_flextool.yaml):
inflation_rate:
- system: inflation_rate
- model: discount_rate
- operation:
- multiply:
with: 0.01
Forward (entity dimension reordering):
# From FlexTool: unit.inputNode [name, unit, node]
# To CESM: node_to_unit [name, source=node, sink=unit]
node_to_unit-entities:
- unit.inputNode
- node_to_unit:
order: [[0], [2], [1]]
Reverse:
# From CESM: node_to_unit [name, source, sink]
# To FlexTool: unit.inputNode [name, unit=sink, node=source]
node_to_unit-entities:
- node_to_unit
- unit.inputNode:
order: [[0], [2], [1]]
Step 7: Create a writer
If your target format requires custom writing logic, implement a writer in src/writers/to_{format}.py.
The writer receives a Dict[str, pd.DataFrame] and writes it to the target format.
Existing writers include:
src/writers/to_duckdb.py-
Writes DataFrames to a DuckDB database with metadata for round-trip fidelity.
src/writers/to_spine_db.py-
Writes DataFrames to a Spine Toolbox database, creating entity classes, entities, and parameter values.
If your target format is one of the supported databases (DuckDB, Spine), you can reuse the existing writer and skip this step.
Step 8: Create a processing script
Create a command-line script in scripts/processing/ that ties the reader, transformer, and writer together.
Follow the pattern established by existing scripts.
Script structure
# scripts/processing/{format}_to_cesm.py
"""
Read {Format} data and convert to CESM DuckDB format.
"""
import argparse
from pathlib import Path
from core.transform_parameters import transform_data
from readers.from_{format} import {format}_to_dataframes
from writers.to_duckdb import dataframes_to_duckdb
def main():
parser = argparse.ArgumentParser(
description="Convert {Format} data to CESM DuckDB format"
)
parser.add_argument("input", type=str, help="Input file path")
parser.add_argument("output", type=str, help="Output DuckDB file path")
args = parser.parse_args()
# Step 1: Read source data
source_dfs = {format}_to_dataframes(args.input)
# Step 2: Apply YAML transformer
transformer_yaml = "src/transformers/{format}/cesm_v0.1.0/v1.0.0/from_{format}.yaml"
cesm = transform_data(source_dfs, transformer_yaml)
# Step 3: Apply Python transformer (if needed)
from transformers.{format}.cesm_v0_1_0.v1_0_0.to_cesm import transform_to_cesm
cesm = transform_to_cesm(source_dfs, cesm)
# Step 4: Write output
dataframes_to_duckdb(cesm, args.output)
if __name__ == "__main__":
main()
Existing scripts for reference
scripts/processing/flextool_to_cesm.py-
Reads from Spine database, applies YAML + Python transformer, writes to DuckDB. Supports scenario selection, version flags, and optional datetime start time.
scripts/processing/griddb_to_cesm.py-
Reads from GridDB SQLite, applies Python-only transformer, writes to DuckDB.
scripts/processing/cesm_to_flextool.py-
Reverse direction: reads CESM DuckDB, applies reverse YAML + Python transformer, writes to Spine database.
Step 9: Add to Spine Toolbox project (optional)
If you want to integrate your conversion into a visual workflow, you can add it to the Spine Toolbox project. This is optional but useful for users who prefer a graphical interface over command-line scripts.
The Spine Toolbox project files are in the repository root. Add your processing script as a Tool item and connect it to the appropriate data stores.
Step 10: Test with round-trip verification
The most reliable way to validate a conversion is round-trip testing:
-
Start with data in your source format.
-
Convert to CESM.
-
Convert back to your source format.
-
Compare original and round-tripped data.
What to verify
- Entity preservation
-
All entities from the source should appear in the round-tripped output. Check that no entities were lost or duplicated.
- Parameter accuracy
-
Scalar parameters should match exactly (accounting for unit conversion round-trips). Time series should match within floating-point tolerance.
- Relationship integrity
-
Multi-dimensional relationships (unit-to-node connections, link endpoints) must survive the round trip with correct dimension ordering.
- Temporal structure
-
Timeline indexes, solve patterns, and period definitions should reconstruct correctly.
Testing approach
The test suite is in the tests/ directory.
Write tests that:
def test_round_trip():
"""Verify source -> CESM -> source preserves data."""
# Read original
original = read_source_format("test_data/sample.ext")
# Forward: source -> CESM
cesm = transform_to_cesm(original)
# Reverse: CESM -> source
result = transform_to_source(cesm)
# Compare entity counts
assert set(original.keys()) == set(result.keys())
# Compare parameter values
for name in original:
pd.testing.assert_frame_equal(
original[name], result[name],
check_dtype=False, atol=1e-10
)
Known round-trip limitations
Some information loss is expected in round trips:
-
Parameters with no CESM equivalent will be dropped during forward conversion.
-
Default values in the source format may not be preserved if CESM represents them differently.
-
Temporal resolution differences (e.g., CESM’s explicit timeline vs. a source format’s pattern-based time representation) may cause minor differences.
Document these known limitations in your transformer module’s docstrings.
Summary checklist
Use this checklist when adding support for a new format:
| Step | Deliverable |
|---|---|
1. Analyze format |
Document entity types, parameters, relationships, temporal structure |
2. Reader |
|
3. Entity mapping |
YAML file with |
4. Parameter mapping |
YAML file with parameter rules (conversions, renames, operations) |
5. Python transformer |
|
6. Reverse transformer |
|
7. Writer |
|
8. Processing script |
|
9. Spine Toolbox |
Optional workflow integration |
10. Tests |
Round-trip verification in |