6. GridPath Data Toolkit

The GridPath Data Toolkit provides functionality to create GridPath scenario inputs from raw data. The user may provide their own data and use the Toolkit to convert the data to GridPath CSV input format for use in buildling a GridPath database. The Toolkit also includes functionality to download raw data from PUDL and from the GridPath RA Toolkit.

6.1. Obtaining Raw Data

6.1.1. PUDL

The Public Utility Data Liberation (PUDL) project collates publicly available data from a range of sources, and puts the data into a single database after cleaning, standardizing, and cross-linking the various datasets.

Download Datasets

To download data from PUDL, use the gridpath_get_pudl_data command. This will download the pudl.sqlite database as well as the RA Toolkit wind and solar profiles Parquet file, and the EIA930 hourly interchange data Parquet file. See –help menu for options and defaults, e.g., download location, the Zenodo record number for each dataset, skipping datasets, etc.

Convert to GridPath Raw Format

GridPath can currenlty utilize a subset of the downloaded PUDL data, including:

Form EIA-860: generator-level specific information about existing and planned generators

Form EIA-930: hourly operating data about the high-voltage bulk electric power grid in the Lower 48 states collected from the electricity balancing authorities (BAs) that operate the grid

EIA AEO Table 54 (Electric Power Projections by Electricity Market Module Region): fuel price forecasts

GridPath RA Toolkit variable generation profiles created for the 2026 Western RA Study: these include hourly wind profiles by WECC BA based on assumed 2026 wind buildout for weather years 2007-2014 and hourly solar profiles by WECC BA based on assumed 2026 buildout for weather years 1998-2019; see the study for how profiles were created and note the study was conducted in 2022.

First, the data must be converted to the GridPath raw data CSV format. For the purpose, use the gridpath_pudl_to_gridpath_raw command.

This will query the PUDL database and process the Parquet files downloaded in the previous step in order to create the following files in the user-specified raw data directory.

pudl_eia860_generators.csv

pudl_eia930_hourly_interchange.csv

pudl_eiaaeo_fuel_prices.csv

pudl_ra_toolkit_var_profiles.csv

For options, including the download and raw data directories as well query filters see the –help menu. By default, we currently use 2024-01-01 as the EIA860 reporting data and “western_electricity_coordinating_council” as the EIA AEO electricity market to get data for.

6.1.2. GridPath RA Toolkit

The GridPath RA Toolkit datasets were developed to support the 2026 Western US case resource adequacy study.

6.2. Using the GridPath Data Toolkit

The various functionalities available in the GridPath Data Toolkit can be accessed via the gridpath_run_data_toolkit command. See the --help menu for the available individual Toolkit steps. You may run individual steps only or list the steps you want to run with their respective arguments in a settings file you can point to with the --settings_csv argument. Descriptions of the individual steps available in the Toolkit are below.

6.2.1. Building the Raw Data Database

The first step in using the GridPath Data Toolkit is to create a raw data database. You may do so with the following command:

>>> gridpath_run_data_toolkit --single_step create_database --database PATH/TO/RAW/DB --db_schema ./raw_data_db_schema.sql --omit_data

6.2.2. Loading Raw Data

Load data into the GridPath raw data database. See the documentation of each GridPath Data Toolkit module for data prerequisites. Use the files_to_import.csv file to tell GridPath which CSV files should be loaded into which database table.

What this step does

This module is a generic bulk loader for raw CSV data into the GridPath database. It reads a file named files_to_import.csv located in the directory given by --csv_location. Each row of that file describes one CSV file: an import flag (whether the file should be loaded), the CSV filename (relative to --csv_location), and the database table the file should be loaded into.

The loader iterates over the CSV file rows and, for each row whose import flag is True, reads the corresponding CSV from --csv_location and appends its contents to the named database table (existing rows are preserved; data is inserted with if_exists="append"). Rows whose import flag is False are skipped.

This generic loader is used throughout the Data Toolkit workflow to populate raw_data tables (e.g., VER profiles and their unit mapping, hydro operating characteristics) that later Data Toolkit steps depend on.

Usage

>>> python -m data_toolkit.load_raw_data --database PATH/TO/DATABASE --csv_location PATH/TO/CSV/DIRECTORY

Settings

database

csv_location

The --csv_location directory must contain a files_to_import.csv manifest with columns for the import flag, the CSV filename, and the destination database table, in that order.

6.2.3. Load Zone Inputs

EIA 930 BAs

Create GridPath load_zone inputs (load_zone_scenario_id) based on BAs in Form EIA 930.

Usage

>>> gridpath_run_data_toolkit --single_step eia930_load_zone_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This script depends on having loaded the Form EIA 930 hourly interchange data and to have defined a region for each BA in the user_defined_baa_key table ( in order to filter BAs if needed). It assumes the following raw input database tables have been populated:

raw_data_eia930_hourly_interchange

user_defined_baa_key

Settings

database

lz_output_directory

load_zone_scenario_id

load_zone_scenario_name

lb_output_directory

load_balance_scenario_id

load_balance_scenario_name

allow_overgeneration

overgeneration_penalty_per_mw

allow_unserved_energy

unserved_energy_penalty_per_mwh

max_unserved_load_penalty_per_mw

avg_unserved_load_penalty_per_mwa

export_penalty_cost_per_mwh

unserved_energy_stats_threshold_mw

6.2.4. Temporal Inputs

Monte Carlo Weather Iteration Draws

The Monte Carlo approach employed in the GridPath RA Toolkit study synthesizes multiple years of plausible hourly load, wind availability, solar availability, and temperature-driven thermal derate data over which the system operations can be simulated. Synthetic days are built by combining load, wind, solar, and temperature derate shapes from different but similar days in the historical record. For a detailed description of the methodology, see Appendix B of the report available at https://gridlab.org/wp-content/uploads/2022/10/GridLab_RA-Toolkit-Report-10-12-22.pdf.

Methodology

This module produces, for each Monte Carlo iteration, a full synthetic study-year sequence of weather day bins. A weather day bin is a categorical label (e.g., one of five quintiles per month/day_type) that indicates the historical day’s weather (e.g., based on maximum or average temperature). The bins themselves are produced upstream by the user and are stored in user_defined_weather_bins; this module only resamples them into new synthetic chronologies. The downstream create_monte_carlo_weather_draw_profiles step then maps each drawn bin to an actual historical day’s load/wind/solar/derate shapes.

First-order Markov bin chain

The synthetic bin sequence is generated as a first-order Markov chain over the historical record, in order to preserve realistic day-to-day persistence of weather (e.g., heat waves and cold snaps span multiple days):

The first day of the year (Jan 1) is seeded by drawing uniformly at random from all historical January bins (see starting_weather_bin).

For each subsequent calendar day, given the prior day’s bin b, we look across the historical record for every day in the current month whose bin equals b, take the bin of the day that immediately followed each such day, and draw uniformly at random from that set of “following-day” bins. That draw becomes the current day’s bin and the prior bin for the next step. In effect, step 2 samples from the empirically estimated transition probability P(bin_today | bin_yesterday) for the relevant month.

Reproducibility (seeding)

By default no seed is set (weather_draws_seed defaults to None), so each run produces a different random ensemble of synthetic weather years. To get reproducible draws, pass --weather_draws_seed <int>. The seed actually used is recorded in aux_weather_draws_info alongside the draws.

Other notes

Modeled distributions may deviate from historical averages depending on within-month weather drift. In shoulder seasons, day-to-day directional trends can cause modeled averages to deviate from historical means by ~1–3% (skewing lower in warming months and higher in cooling months).

Usage

>>> gridpath_run_data_toolkit --single_step create_monte_carlo_weather_draws --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

user_defined_weather_bins

Settings

database

weather_bins_id

weather_draws_id

weather_draws_seed

n_iterations

study_year

timeseries_iteration_draw_initial_seed

Temporal Scenarios

This is a very basic module that copies over the base CSVs created by the user and calls the temporal iterations method to create the iterations.csv file if needed. The location of the base CSVs and the iterations description CSV are specified in a settings file you can point to with the --csv_path argument.

Usage

>>> gridpath_run_data_toolkit --single_step create_temporal_scenarios --settings_csv PATH/TO/SETTINGS/CSV

Settings

csv_path

6.2.5. Load Inputs

Sync Loads

Create GridPath sync load profile inputs.

Usage

>>> gridpath_run_data_toolkit --single_step create_sync_load_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_system_load
user_defined_load_zone_units

Settings

database

output_directory

load_scenario_id

load_scenario_name

overwrite

Monte Carlo Loads

Create GridPath Monte Carlo load profile inputs. Before running this module, you will need to create weather draws with the create_monte_carlo_draws module (see Monte Carlo Weather Iteration Draws).

What this step does

This module reads the synthetic per-iteration load implied by the weather draws produced upstream and writes it out as the GridPath load input CSVs that the model actually consumes. The synthetic per-iteration load is constructed by joining the drawn synthetic weather days (aux_weather_iterations, indexed by weather_bins_id and weather_draws_id) back to the historical hourly load in raw_data_system_load. The resulting CSVs are tagged with the given load / load-components / load-levels scenario ids and names and are written to --output_directory.

Methodology

The module produces up to three CSVs, each of which can be suppressed with a corresponding --skip_* flag:

Load scenario CSV (create_load_scenario_csv): written to --output_directory as <load_scenario_id>_<load_scenario_name>.csv. It is a one-row mapping that ties the load_scenario_id to its load_components_scenario_id and load_levels_scenario_id.

Load components CSV (create_load_components_scenario_csv): written to the load_components subdirectory as <load_components_scenario_id>_<load_components_scenario_name>.csv. It lists one load_component (named via --load_component, default all) per load_zone found in user_defined_load_zone_units.

Load levels CSV (create_load_levels_csv): written to the load_levels subdirectory as <load_levels_scenario_id>_<load_levels_scenario_name>.csv. This file holds the actual hourly load_mw per load_zone, weather_iteration, stage_id, and timepoint.

The load levels are built one synthetic weather iteration at a time. For every drawn day in aux_weather_iterations (matched on weather_bins_id and weather_draws_id), the module identifies the corresponding historical calendar day and, for each load_zone, sums the hourly load from raw_data_system_load across that zone’s constituent unit rows, each scaled by its unit_weight from user_defined_load_zone_units. Timepoint IDs are derived from the draw number and hour_of_day and are offset by --study_year (so they start at 1 by default, or at YYYY0001 when a study year is provided). Rows for successive draws are appended to the single load-levels CSV for the scenario.

These CSVs are the files the GridPath model consumes for load; they are not loaded back into the database by this step.

Overwrite behavior

By default each output file is created only if it does not already exist. The three *_overwrite flags allow the corresponding scenario CSVs to be regenerated in place when they already exist:

--load_scenario_overwrite for the load scenario CSV,

--load_components_overwrite for the load components CSV, and

--load_levels_overwrite for the load levels CSV.

For the load levels CSV, overwrite resets the file (writes a fresh header) only on the first draw of the run and then appends the remaining draws, so the file is rebuilt for the full ensemble rather than truncated mid-write.

Usage

>>> gridpath_run_data_toolkit --single_step create_monte_carlo_load_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

aux_weather_iterations (see the create_monte_carlo_draws step for how to create synthetic weather years and populate this table)
raw_data_system_load
user_defined_load_zone_units

You must run create_monte_carlo_draws before running this module to populate the database with the raw data and the synthetic weather draws.

Settings

database

weather_bins_id

weather_draws_id

output_directory

load_scenario_id

load_scenario_name

load_components_scenario_id

load_components_scenario_name

load_levels_scenario_id

load_levels_scenario_name

stage_id

study_year

load_component

load_scenario_overwrite

load_components_overwrite

load_levels_overwrite

skip_load_scenario

skip_load_components

skip_load_levels

6.2.6. Project Inputs

Form EIA 860 Project Portfolios

This module creates project portfolios from EIA 860 data.

The project capacity_types will be based on the data in the user_defined_eia_gridpath_key table.

Wind, solar, and hydro are aggregated to the BA level.

Note

Hybrid projects are currently not treated separately by this module. Their renewable generation components are lumped with wind/solar, and the storage components show up as individual units.

Project portfolios are created based on the data from a particular report date. The user selects the region (determines subset of generators to use) and the study date (determines which generators are operational, i.e., after their online date and before their retirement date in the EIA data.)

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_portfolio_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key
user_defined_baa_key

Settings

database

output_directory

study_year

region

project_portfolio_scenario_id

project_portfolio_scenario_name

TODO: disaggregate the hybrids out of the wind/solar project and combine: with their battery components

Form EIA 860 Project Load Zones

This module creates project load zone input CSVs for a EIA860-based project portfolio based on the user-defined mapping in the user_defined_eia_gridpath_key table.

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_load_zone_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key

Settings

database

output_directory

study_year

region

project_load_zone_scenario_id

project_load_zone_scenario_name

Form EIA 860 Project Availability

Create availability type CSV for a EIA860-based project portfolio. Availability types are set to ‘exogenous’ for all projects with no exogenous profiles specified (i.e., always available).

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_availability_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key

Settings

database

output_directory

study_year

region

project_availability_scenario_id

project_availability_scenario_name

Availability Iteration Inputs

Run unit outage simulation and create availability iteration inputs.

What this step does

This module runs a Monte Carlo unit-outage simulation and writes the resulting exogenous availability (derate) input CSVs. Using the per-unit availability parameters loaded from --outage_params_input_csv (into the raw_data_unit_availability_params table) – forced-outage rates (unit_for), mean time to repair (unit_mttr), the number of units (n_units), the unit weight, and the per-unit outage model (unit_fo_model) – it simulates --n_iterations independent outage timelines for each project, drawing random forced and (under the sequential model) repair/maintenance transitions.

The outage model is selected per unit via the unit_fo_model column and may be one of:

Derate – a static derate 1 - unit_for applied in every timepoint.

MC_independent – each timepoint’s outage state is drawn independently from a uniform distribution against the forced-outage rate.

MC_sequential – a sequential (exponential) failure/repair process driven by the forced-outage rate and unit_mttr (the implied mean time to failure is mttr * (1 / for - 1)), preserving outage persistence across timepoints.

historical_year – instead of simulating, a random historical year is sampled for the unit from --historical_availability_csv and that year’s hourly derate series is used directly. (This is can be used for units whose availability is taken from a historical record rather than simulated; the choice is driven by the unit’s unit_fo_model value, not by project type.)

For each project the per-unit availability adjustments are combined using each unit’s unit_weight to form a weighted project-level derate. Hybrid-storage projects (hybrid_stor set) additionally get a separately simulated derate for the storage component. By default only rows whose derate differs from 1 are written (as default availability in GridPath is 1); pass --print_ones to retain all rows.

Output is written to --output_directory as one CSV per project, named <project>-<project_availability_scenario_id>-<project_availability_scenario_name>.csv. --n_parallel_projects parallelizes the simulation across projects and --overwrite replaces existing files (otherwise existing files are appended to). --sort re-sorts each output file at the end. These outage iterations are intended to align with the weather/hydro iterations to form complete Monte Carlo draws.

Reproducibility (seeding)

By default seeding is OFF: --user_provided_seeding is not set, so the outage simulation is fully random and non-reproducible from run to run. When seeding is off, all of the seeding flags below are ignored – the seed arguments are replaced with None before the simulation runs, and NumPy’s global RNG is never explicitly seeded.

To get reproducible outages, set --user_provided_seeding together with a --starting_project_iteration_seed <int> (defaults to 0). With seeding on:

Per-project, non-overlapping seed ranges. Each project is assigned a starting seed of starting_project_iteration_seed + project_idx * n_iterations. Within a project the per-iteration seed starts at that value and is incremented by 1 for each of the n_iterations iterations, so the seed ranges of distinct projects do not overlap.

Per-unit seeds within an iteration. For a given project iteration, the per-iteration seed is used to seed NumPy’s RNG, which then draws one integer seed per unit via np.random.randint(1, max_integer_for_unit_outage_seeding, size=n_units_in_project). Each unit’s outage timeline is then simulated from its own seed. --max_integer_for_unit_outage_seeding defaults to 1000000.

Hybrid-storage offset. For hybrid-storage projects, the storage component is simulated with a seed offset from the generator component’s unit seed by --hybrid_storage_seed_increment (defaults to 1000).

Every project / unit / iteration still draws its own independent random outage timeline, but the whole simulation reproduces exactly when re-run with the same seed settings. Again, these flags are ignored unless --user_provided_seeding is set. Caution advised when seeding.

Usage

>>> gridpath_run_data_toolkit --single_step create_availability_iteration_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database table has been populated:

raw_data_unit_availability_params

This table can be populated ahead of time, or loaded at run time by passing --outage_params_input_csv. Units that use the historical_year outage model additionally read their derate series from the CSV passed via --historical_availability_csv.

Settings

database

outage_params_input_csv

historical_availability_csv

stage_id

n_iterations

study_year

project_availability_scenario_id

project_availability_scenario_name

output_directory

overwrite

sort

print_ones

n_parallel_projects

user_provided_seeding

starting_project_iteration_seed

max_integer_for_unit_outage_seeding

hybrid_storage_seed_increment

Weather Derates (Sync)

Create GridPath sync weather iteration availability inputs.

Usage

>>> gridpath_run_data_toolkit --single_step create_sync_gen_weather_derate_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_availability_profiles
raw_data_unit_availability_params

Settings

database

output_directory

exogenous_availability_weather_scenario_id

exogenous_availability_weather_scenario_name

overwrite

n_parallel_projects

Weather Derates (Monte Carlo)

Create GridPath Monte Carlo weather iteration availability inputs. Before running this module, you will need to create weather draws with the create_monte_carlo_draws module (see Monte Carlo Weather Iteration Draws).

Usage

>>> gridpath_run_data_toolkit --single_step create_monte_carlo_weather_derate_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_unit_availability_params
raw_data_availability_profiles
aux_weather_iterations (see the create_monte_carlo_draws step for how to create synthetic weather years and populate this table)

Settings

database

output_directory

exogenous_availability_weather_scenario_id

exogenous_availability_weather_scenario_name

overwrite

n_parallel_projects

weather_bins_id

weather_draws_id

Form EIA 860 Project Capacity

Create specified capacity CSV for a EIA860-based project portfolio.

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_specified_capacity_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key

Settings

database

output_directory

study_year

region

project_specified_capacity_scenario_id

project_specified_capacity_scenario_name

Form EIA 860 Projects – Create CSV with Fixed Costs Set to Zero

Create fixed cost CSV for a EIA860-based project portfolio with fixed costs set to zero as fixed cost data are not available at this time. The CSV is necessary to create since fixed costs are currently a required GridPath input.

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_fixed_cost_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key

Settings

database

output_directory

study_year

region

project_fixed_cost_scenario_id

project_fixed_cost_scenario_name

Form EIA 860 Projects User-Defined Operating Characteristics

Create opchar CSV for a EIA860-based project portfolio. Note that most of operating characteristics are user-defined in the user_defined_eia_gridpath_key table and will take default values until more detailed data are available.

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_opchar_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key

Settings

database

output_directory

study_year

region

project_operational_chars_scenario_id

project_operational_chars_scenario_name

project_fuel_scenario_id

variable_generator_profile_scenario_id

hydro_operational_chars_scenario_id

Form EIA 860 Project Fuels

Create project fuels CSV for a EIA860-based project portfolio.

Note

Some fuel regions in the EIA AEO are more disaggragated than the BA in Form EIA 860 (e.g. CA South and North regions in the AEO, and CISO BA in Form EIA 860). This module currently can only assign one fuel region to each BA. If you need the extra resolution, you will need to modify it.

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_fuel_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key
user_defined_baa_key

Settings

database

output_directory

study_year

region

project_fuel_scenario_id

project_fuel_scenario_name

Form EIA 860 Project Heat Rates (User-Defined by Tech)

Create project heat rate CSV for a EIA860-based project portfolio.

Note

Heat rates are user-specified and generic by technology. If you need more granular heat rates by, say, project, you would need to modify this module.

Note

The query in this module is consistent with the project selection from eia860_to_project_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia860_to_project_heat_rate_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia860_generators
user_defined_eia_gridpath_key
user_defined_heat_rate_curve

Settings

database

output_directory

study_year

region

project_hr_scenario_id

project_hr_scenario_name

Variable Gen Profiles (Sync)

Create GridPath sync variable generation profile inputs.

Usage

>>> gridpath_run_data_toolkit --single_step create_sync_var_gen_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_var_profiles
raw_data_var_project_units

Settings

database

output_directory

variable_generator_profile_scenario_id

variable_generator_profile_scenario_name

overwrite

n_parallel_projects

Variable Gen Profiles (Monte Carlo)

Create GridPath Monte Carlo variable generation profile inputs. Before running this module,you will need to create weather draws with the create_monte_carlo_draws module (see Monte Carlo Weather Iteration Draws).

What this step does

This is the variable energy resource (VER) counterpart to the load-CSV step. It reads the synthetic per-iteration variable generation profiles – assembled from raw_data_var_profiles (the raw hourly unit-level cap_factor data) and raw_data_var_project_units (the project-to-unit mapping and per-unit weights), resampled according to the weather draws stored in aux_weather_iterations – and writes them out as GridPath variable-generator profile input CSVs in --output_directory, tagged with the given --variable_generator_profile_scenario_id and --variable_generator_profile_scenario_name. These CSVs are the files the GridPath model consumes for variable generation.

Methodology

For each project, the per-unit cap_factor values from raw_data_var_profiles are multiplied by their unit_weight and summed to produce a single project-level cap_factor time series. The weather draws in aux_weather_iterations (selected by --weather_bins_id and --weather_draws_id) determine, for each Monte Carlo weather_iteration and draw_number, which historical day’s data to pull, and the draw number is used to compute the timepoint ID. One output CSV is written per project, named {project}-{scenario_id}-{scenario_name}.csv, with an accompanying iterations CSV written to an iterations subdirectory of --output_directory.

--n_parallel_projects N processes up to N projects concurrently (via a multiprocessing pool over the project pool) to speed things up. --overwrite deletes any existing CSVs with the matching project/scenario filename before writing; without it, output is appended to existing files.

Usage

>>> gridpath_run_data_toolkit --single_step create_monte_carlo_var_gen_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_var_profiles
raw_data_var_project_units
aux_weather_iterations (see the create_monte_carlo_draws step for how to create synthetic weather years and populate this table)

You must run create_monte_carlo_draw_profiles before running this module to populate the database with the raw data and the synthetic weather draws.

Settings

database

output_directory

variable_generator_profile_scenario_id

variable_generator_profile_scenario_name

overwrite

n_parallel_projects

weather_bins_id

weather_draws_id

Hydro Gen Inputs

Create hydro iteration input CSVs from year/month data.

Usage

>>> gridpath_run_data_toolkit --single_step create_hydro_iteration_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_project_hydro_opchars_by_year_month
raw_data_hydro_years
user_defined_balancing_type_horizons

Settings

database

output_directory

hydro_operational_chars_scenario_id

hydro_operational_chars_scenario_name

overwrite

n_parallel_projects

What this step does

This module builds GridPath hydro operational-characteristics input CSVs from the year/month hydro data loaded earlier (raw_data_project_hydro_opchars_by_year_month, raw_data_hydro_years, and the user-defined balancing-type horizons in user_defined_balancing_type_horizons). For each hydro iteration it derives the per-horizon hydro operating parameters – the average, minimum, and maximum power fractions – and writes them to --output_directory under the given hydro_operational_chars_scenario_id and hydro_operational_chars_scenario_name. --n_parallel_projects N runs up to N projects at once, and --overwrite replaces existing CSVs.

Methodology

The distinct projects to process are read from raw_data_project_hydro_opchars_by_year_month, and one CSV is written per project, named <project>-<scenario_id>-<scenario_name>.csv in --output_directory. Projects are processed in a multiprocessing pool sized by --n_parallel_projects (defaults to 1).

Hydro iterations and balancing-type horizons

The set of hydro years is read from raw_data_hydro_years and each year is treated as one hydro iteration (written into the hydro_iteration column). The set of (balancing_type, horizon) pairs is read from user_defined_balancing_type_horizons; if --hydro_balancing_type is supplied, the pairs are filtered to that single balancing type (e.g. day, week, month), otherwise all balancing types are included. For every combination of hydro year and balancing-type horizon, one output row is produced.

Deriving per-horizon power fractions

The average_power_fraction, min_power_fraction, and max_power_fraction for each horizon are computed by month-weighting the raw monthly opchar values. For a given balancing-type horizon, the module reads its hour_ending_of_year_start and hour_ending_of_year_end from user_defined_balancing_type_horizons and walks each hour of the year in that range, mapping the hour to a calendar month (via a pandas.Timestamp anchored at January 1 of the hydro year) and counting the number of hours that fall in each month. These hour counts become the per-month weights for the horizon.

For each month touched by the horizon, the module looks up the project’s average_power_fraction, min_power_fraction, and max_power_fraction for that hydro year and month in raw_data_project_hydro_opchars_by_year_month, multiplies each by the month’s hour-count weight, sums across months, and divides by the total number of hours in the horizon. The result is an hours-weighted average of the monthly fractions for each of the three parameters, written as a single row keyed by balancing_type_project and horizon (with weather_iteration set to 0, i.e. no weather iteration). Note we take the weighted averages of the mins and maxes, not the mins of the mins or the maxes of the maxes.

Writing and overwriting output

Rows are appended to the project’s CSV as they are generated, with the header written only when the file does not yet exist. When --overwrite is set, any existing CSV for the project is deleted before processing begins so it is rebuilt from scratch; without --overwrite, new rows are appended to any existing file.

If the corresponding --*_input_csv paths are provided, the raw-data tables (raw_data_project_hydro_opchars_by_year_month, raw_data_hydro_years, user_defined_balancing_type_horizons) are loaded from those CSVs before the inputs are built; otherwise the data is assumed to already be present in the database.

6.2.7. Fuel Inputs

EIA AEO Fuel Chars (User-Defined)

Create GridPath fuel chars inputs (fuel_scenario_id) for fuels in the EIA AEO. The fuel characteristics are user-defined.

Usage

>>> gridpath_run_data_toolkit --single_step eiaaeo_to_fuel_chars_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

Thios module assumes the following raw input database tables have been populated:

raw_data_eiaaeo_fuel_prices

user_defined_eia_gridpath_key

user_defined_generic_fuel_intensities

user_defined_eiaaeo_region_key

Settings

database

output_directory

model_case

report_year

fuel_scenario_id

fuel_scenario_name

EIA AEO Fuel Prices

Create GridPath fuel price inputs (fuel_scenario_id) based on the EIA AEO.

Warning

The user is reponsible for ensuring that all prices and costs in their model are in a consistent real currency year.

Usage

>>> gridpath_run_data_toolkit --single_step eiaaeo_fuel_price_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

Thios module assumes the following raw input database tables have been populated:

raw_data_eiaaeo_fuel_prices

user_defined_eiaaeo_region_key

Settings

database

output_directory

model_case

report_year

fuel_price_id

6.2.8. Transmission Inputs

Form EIA 930 Transmission Portfolio

This module creates a transmission line portfolio input CSV for an EIA930-based transmission portfolio. The transmission capacity type is set “tx_spec” for all lines.

Usage

>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_portfolio_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia930_hourly_interchange

Settings

database

output_directory

region

transmission_portfolio_scenario_id

transmission_portfolio_scenario_name

Form EIA 930 Tranmission Load Zones

Create load zone input CSV for a EIA930-based transmission portfolio.

Note

The query in this module is consistent with the transmission selection from eia930_to_transmission_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_load_zone_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia930_hourly_interchange

Settings

database

output_directory

region

transmission_load_zone_scenario_id

transmission_load_zone_scenario_name

Form EIA 930 Transmission Availability

Create availability type CSV for a EIA930-based project portfolio. Availability types are set to ‘exogenous’ for all transmission lines with no exogenous profiles specified (i.e., always available).

Note

The query in this module is consistent with the project selection from eia930_to_transmission_portfolio_input_csvs.

Usage

>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_availability_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia930_hourly_interchange
user_defined_baa_key

Settings

database

output_directory

region

transmission_availability_scenario_id

transmission_availability_scenario_name

Form EIA 930 Transmission Capacity

Create specified capacity CSV for a EIA930-based transmission portfolio.

Note

The query in this module is consistent with the transmission selection from eia930_to_transmission_portfolio_input_csvs.

Warning

Only minimal, manual data cleaning has been conducted on this dataset. More robust processing is required for usability past the demo stage.

Usage

>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_specified_capacity_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia930_hourly_interchange

Settings

database

output_directory

study_year

region

transmission_specified_capacity_scenario_id

transmission_specified_capacity_scenario_name

Form EIA 930 Transmission Opchar

This module creates transmission opchar input CSV for an EIA930-based transmission portfolio. The transmission operational type is set to “tx_simple” and the losses are set to 2% by default.

Usage

>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_ochar_input_csvs --settings_csv PATH/TO/SETTINGS/CSV

Input prerequisites

This module assumes the following raw input database tables have been populated:

raw_data_eia930_hourly_interchange

Settings

database

output_directory

tx_simple_loss_factor

region

transmission_operational_chars_scenario_id

transmission_operational_chars_scenario_name