6. GridPath Data Toolkit
The GridPath Data Toolkit provides functionality to create GridPath scenario inputs from raw data. The user may provide their own data and use the Toolkit to convert the data to GridPath CSV input format for use in buildling a GridPath database. The Toolkit also includes functionality to download raw data from PUDL and from the GridPath RA Toolkit.
6.1. Obtaining Raw Data
6.1.1. PUDL
The Public Utility Data Liberation (PUDL) project collates publicly available data from a range of sources, and puts the data into a single database after cleaning, standardizing, and cross-linking the various datasets.
Download Datasets
To download data from PUDL, use the gridpath_get_pudl_data command.
This will download the pudl.sqlite database as well as the RA Toolkit
wind and solar profiles Parquet file, and the EIA930 hourly interchange
data Parquet file. See –help menu for options and defaults, e.g., download
location, the Zenodo record number for each dataset, skipping datasets, etc.
Convert to GridPath Raw Format
GridPath can currenlty utilize a subset of the downloaded PUDL data, including:
Form EIA-860: generator-level specific information about existing and planned generators
Form EIA-930: hourly operating data about the high-voltage bulk electric power grid in the Lower 48 states collected from the electricity balancing authorities (BAs) that operate the grid
EIA AEO Table 54 (Electric Power Projections by Electricity Market Module Region): fuel price forecasts
GridPath RA Toolkit variable generation profiles created for the 2026 Western RA Study: these include hourly wind profiles by WECC BA based on assumed 2026 wind buildout for weather years 2007-2014 and hourly solar profiles by WECC BA based on assumed 2026 buildout for weather years 1998-2019; see the study for how profiles were created and note the study was conducted in 2022.
First, the data must be converted to the GridPath raw data CSV format. For the
purpose, use the gridpath_pudl_to_gridpath_raw command.
This will query the PUDL database and process the Parquet files downloaded in the previous step in order to create the following files in the user-specified raw data directory.
pudl_eia860_generators.csv
pudl_eia930_hourly_interchange.csv
pudl_eiaaeo_fuel_prices.csv
pudl_ra_toolkit_var_profiles.csv
For options, including the download and raw data directories as well query filters see the –help menu. By default, we currently use 2024-01-01 as the EIA860 reporting data and “western_electricity_coordinating_council” as the EIA AEO electricity market to get data for.
6.1.2. GridPath RA Toolkit
The GridPath RA Toolkit datasets were developed to support the 2026 Western US case resource adequacy study.
6.2. Using the GridPath Data Toolkit
The various functionalities available in the GridPath Data Toolkit can be
accessed via the gridpath_run_data_toolkit command. See the --help
menu for the available individual Toolkit steps. You may run individual steps
only or list the steps you want to run with their respective arguments in a
settings file you can point to with the --settings_csv argument.
Descriptions of the individual steps available in the Toolkit are below.
6.2.1. Building the Raw Data Database
The first step in using the GridPath Data Toolkit is to create a raw data database. You may do so with the following command:
>>> gridpath_run_data_toolkit --single_step create_database --database PATH/TO/RAW/DB --db_schema ./raw_data_db_schema.sql --omit_data
6.2.2. Loading Raw Data
Load data into the GridPath raw data database. See the documentation of each
GridPath Data Toolkit module for data prerequisites. Use the
files_to_import.csv file to tell GridPath which CSV files should be loaded
into which database table.
What this step does
This module is a generic bulk loader for raw CSV data into the GridPath
database. It reads a file named files_to_import.csv located in the
directory given by --csv_location. Each row of that file describes one
CSV file: an import flag (whether the file should be loaded), the CSV
filename (relative to --csv_location), and the database table the file
should be loaded into.
The loader iterates over the CSV file rows and, for each row whose import flag
is True, reads the corresponding CSV from --csv_location and appends its
contents to the named database table (existing rows are preserved; data is
inserted with if_exists="append"). Rows whose import flag is False are
skipped.
This generic loader is used throughout the Data Toolkit workflow to populate
raw_data tables (e.g., VER profiles and their unit mapping, hydro operating
characteristics) that later Data Toolkit steps depend on.
Usage
>>> python -m data_toolkit.load_raw_data --database PATH/TO/DATABASE --csv_location PATH/TO/CSV/DIRECTORY
Settings
database
csv_location
The --csv_location directory must contain a files_to_import.csv
manifest with columns for the import flag, the CSV filename, and the
destination database table, in that order.
6.2.3. Load Zone Inputs
EIA 930 BAs
Create GridPath load_zone inputs (load_zone_scenario_id) based on BAs in Form EIA 930.
Usage
>>> gridpath_run_data_toolkit --single_step eia930_load_zone_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
This script depends on having loaded the Form EIA 930 hourly interchange data and to have defined a region for each BA in the user_defined_baa_key table ( in order to filter BAs if needed). It assumes the following raw input database tables have been populated:
raw_data_eia930_hourly_interchange
user_defined_baa_key
Settings
database
lz_output_directory
load_zone_scenario_id
load_zone_scenario_name
lb_output_directory
load_balance_scenario_id
load_balance_scenario_name
allow_overgeneration
overgeneration_penalty_per_mw
allow_unserved_energy
unserved_energy_penalty_per_mwh
max_unserved_load_penalty_per_mw
avg_unserved_load_penalty_per_mwa
export_penalty_cost_per_mwh
unserved_energy_stats_threshold_mw
6.2.4. Temporal Inputs
Monte Carlo Weather Iteration Draws
The Monte Carlo approach employed in the GridPath RA Toolkit study synthesizes multiple years of plausible hourly load, wind availability, solar availability, and temperature-driven thermal derate data over which the system operations can be simulated. Synthetic days are built by combining load, wind, solar, and temperature derate shapes from different but similar days in the historical record. For a detailed description of the methodology, see Appendix B of the report available at https://gridlab.org/wp-content/uploads/2022/10/GridLab_RA-Toolkit-Report-10-12-22.pdf.
Methodology
This module produces, for each Monte Carlo iteration, a full synthetic
study-year sequence of weather day bins. A weather day bin is a categorical
label (e.g., one of five quintiles per month/day_type) that indicates the
historical day’s weather (e.g., based on maximum or average temperature). The
bins themselves are produced upstream by the user and are stored in
user_defined_weather_bins; this module only resamples them into
new synthetic chronologies. The downstream
create_monte_carlo_weather_draw_profiles step then maps each drawn bin to
an actual historical day’s load/wind/solar/derate shapes.
First-order Markov bin chain
The synthetic bin sequence is generated as a first-order Markov chain over the historical record, in order to preserve realistic day-to-day persistence of weather (e.g., heat waves and cold snaps span multiple days):
The first day of the year (Jan 1) is seeded by drawing uniformly at random from all historical January bins (see
starting_weather_bin).For each subsequent calendar day, given the prior day’s bin
b, we look across the historical record for every day in the current month whose bin equalsb, take the bin of the day that immediately followed each such day, and draw uniformly at random from that set of “following-day” bins. That draw becomes the current day’s bin and the prior bin for the next step. In effect, step 2 samples from the empirically estimated transition probabilityP(bin_today | bin_yesterday)for the relevant month.
Reproducibility (seeding)
By default no seed is set (weather_draws_seed defaults to None), so each
run produces a different random ensemble of synthetic weather years. To get
reproducible draws, pass --weather_draws_seed <int>. The seed actually used
is recorded in aux_weather_draws_info alongside the draws.
Other notes
Modeled distributions may deviate from historical averages depending on within-month weather drift. In shoulder seasons, day-to-day directional trends can cause modeled averages to deviate from historical means by ~1–3% (skewing lower in warming months and higher in cooling months).
Usage
>>> gridpath_run_data_toolkit --single_step create_monte_carlo_weather_draws --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
user_defined_weather_bins
Settings
database
weather_bins_id
weather_draws_id
weather_draws_seed
n_iterations
study_year
timeseries_iteration_draw_initial_seed
Temporal Scenarios
This is a very basic module that copies over the base CSVs created by the user
and calls the temporal iterations method to create the iterations.csv file if
needed. The location of the base CSVs and the iterations description CSV are
specified in a settings file you can point to with the --csv_path argument.
Usage
>>> gridpath_run_data_toolkit --single_step create_temporal_scenarios --settings_csv PATH/TO/SETTINGS/CSV
Settings
csv_path
6.2.5. Load Inputs
Sync Loads
Create GridPath sync load profile inputs.
Usage
>>> gridpath_run_data_toolkit --single_step create_sync_load_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_system_load
user_defined_load_zone_units
Settings
database
output_directory
load_scenario_id
load_scenario_name
overwrite
Monte Carlo Loads
Create GridPath Monte Carlo load profile inputs. Before running this module,
you will need to create weather draws with the create_monte_carlo_draws
module (see Monte Carlo Weather Iteration Draws).
What this step does
This module reads the synthetic per-iteration load implied by the weather
draws produced upstream and writes it out as the GridPath load input CSVs that
the model actually consumes. The synthetic per-iteration load is constructed
by joining the drawn synthetic weather days (aux_weather_iterations,
indexed by weather_bins_id and weather_draws_id) back to the historical
hourly load in raw_data_system_load. The resulting CSVs are tagged with
the given load / load-components / load-levels scenario ids and names and are
written to --output_directory.
Methodology
The module produces up to three CSVs, each of which can be suppressed with a
corresponding --skip_* flag:
Load scenario CSV (
create_load_scenario_csv): written to--output_directoryas<load_scenario_id>_<load_scenario_name>.csv. It is a one-row mapping that ties theload_scenario_idto itsload_components_scenario_idandload_levels_scenario_id.Load components CSV (
create_load_components_scenario_csv): written to theload_componentssubdirectory as<load_components_scenario_id>_<load_components_scenario_name>.csv. It lists oneload_component(named via--load_component, defaultall) perload_zonefound inuser_defined_load_zone_units.Load levels CSV (
create_load_levels_csv): written to theload_levelssubdirectory as<load_levels_scenario_id>_<load_levels_scenario_name>.csv. This file holds the actual hourlyload_mwperload_zone,weather_iteration,stage_id, andtimepoint.
The load levels are built one synthetic weather iteration at a time. For every
drawn day in aux_weather_iterations (matched on weather_bins_id and
weather_draws_id), the module identifies the corresponding historical
calendar day and, for each load_zone, sums the hourly load from
raw_data_system_load across that zone’s constituent unit rows, each
scaled by its unit_weight from user_defined_load_zone_units. Timepoint
IDs are derived from the draw number and hour_of_day and are offset by
--study_year (so they start at 1 by default, or at YYYY0001 when a
study year is provided). Rows for successive draws are appended to the single
load-levels CSV for the scenario.
These CSVs are the files the GridPath model consumes for load; they are not loaded back into the database by this step.
Overwrite behavior
By default each output file is created only if it does not already exist. The
three *_overwrite flags allow the corresponding scenario CSVs to be
regenerated in place when they already exist:
--load_scenario_overwritefor the load scenario CSV,
--load_components_overwritefor the load components CSV, and
--load_levels_overwritefor the load levels CSV.
For the load levels CSV, overwrite resets the file (writes a fresh header) only on the first draw of the run and then appends the remaining draws, so the file is rebuilt for the full ensemble rather than truncated mid-write.
Usage
>>> gridpath_run_data_toolkit --single_step create_monte_carlo_load_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
aux_weather_iterations (see the
create_monte_carlo_drawsstep for how to create synthetic weather years and populate this table)raw_data_system_load
user_defined_load_zone_units
You must run create_monte_carlo_draws before running this module to populate the database with the raw data and the synthetic weather draws.
Settings
database
weather_bins_id
weather_draws_id
output_directory
load_scenario_id
load_scenario_name
load_components_scenario_id
load_components_scenario_name
load_levels_scenario_id
load_levels_scenario_name
stage_id
study_year
load_component
load_scenario_overwrite
load_components_overwrite
load_levels_overwrite
skip_load_scenario
skip_load_components
skip_load_levels
6.2.6. Project Inputs
Form EIA 860 Project Portfolios
This module creates project portfolios from EIA 860 data.
The project capacity_types will be based on the data in the user_defined_eia_gridpath_key table.
Wind, solar, and hydro are aggregated to the BA level.
Note
Hybrid projects are currently not treated separately by this module. Their renewable generation components are lumped with wind/solar, and the storage components show up as individual units.
Project portfolios are created based on the data from a particular report date. The user selects the region (determines subset of generators to use) and the study date (determines which generators are operational, i.e., after their online date and before their retirement date in the EIA data.)
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_portfolio_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
user_defined_baa_key
Settings
database
output_directory
study_year
region
project_portfolio_scenario_id
project_portfolio_scenario_name
- TODO: disaggregate the hybrids out of the wind/solar project and combine
with their battery components
Form EIA 860 Project Load Zones
This module creates project load zone input CSVs for a EIA860-based project portfolio based on the user-defined mapping in the user_defined_eia_gridpath_key table.
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_load_zone_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
Settings
database
output_directory
study_year
region
project_load_zone_scenario_id
project_load_zone_scenario_name
Form EIA 860 Project Availability
Create availability type CSV for a EIA860-based project portfolio. Availability types are set to ‘exogenous’ for all projects with no exogenous profiles specified (i.e., always available).
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_availability_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
Settings
database
output_directory
study_year
region
project_availability_scenario_id
project_availability_scenario_name
Availability Iteration Inputs
Run unit outage simulation and create availability iteration inputs.
What this step does
This module runs a Monte Carlo unit-outage simulation and writes the resulting
exogenous availability (derate) input CSVs. Using the per-unit availability
parameters loaded from --outage_params_input_csv (into the
raw_data_unit_availability_params table) – forced-outage rates
(unit_for), mean time to repair (unit_mttr), the number of units
(n_units), the unit weight, and the per-unit outage model
(unit_fo_model) – it simulates --n_iterations independent outage
timelines for each project, drawing random forced and (under the sequential
model) repair/maintenance transitions.
The outage model is selected per unit via the unit_fo_model column and may
be one of:
Derate– a static derate1 - unit_forapplied in every timepoint.
MC_independent– each timepoint’s outage state is drawn independently from a uniform distribution against the forced-outage rate.
MC_sequential– a sequential (exponential) failure/repair process driven by the forced-outage rate andunit_mttr(the implied mean time to failure ismttr * (1 / for - 1)), preserving outage persistence across timepoints.
historical_year– instead of simulating, a random historical year is sampled for the unit from--historical_availability_csvand that year’s hourly derate series is used directly. (This is can be used for units whose availability is taken from a historical record rather than simulated; the choice is driven by the unit’sunit_fo_modelvalue, not by project type.)
For each project the per-unit availability adjustments are combined using each
unit’s unit_weight to form a weighted project-level derate. Hybrid-storage
projects (hybrid_stor set) additionally get a separately simulated derate
for the storage component. By default only rows whose derate differs from 1 are
written (as default availability in GridPath is 1); pass --print_ones to
retain all rows.
Output is written to --output_directory as one CSV per project, named
<project>-<project_availability_scenario_id>-<project_availability_scenario_name>.csv.
--n_parallel_projects parallelizes the simulation across projects and
--overwrite replaces existing files (otherwise existing files are appended
to). --sort re-sorts each output file at the end. These outage iterations
are intended to align with the weather/hydro iterations to form complete Monte
Carlo draws.
Reproducibility (seeding)
By default seeding is OFF: --user_provided_seeding is not set, so the
outage simulation is fully random and non-reproducible from run to run. When
seeding is off, all of the seeding flags below are ignored – the seed
arguments are replaced with None before the simulation runs, and NumPy’s
global RNG is never explicitly seeded.
To get reproducible outages, set --user_provided_seeding together with a
--starting_project_iteration_seed <int> (defaults to 0). With seeding
on:
Per-project, non-overlapping seed ranges. Each project is assigned a starting seed of
starting_project_iteration_seed + project_idx * n_iterations. Within a project the per-iteration seed starts at that value and is incremented by 1 for each of then_iterationsiterations, so the seed ranges of distinct projects do not overlap.Per-unit seeds within an iteration. For a given project iteration, the per-iteration seed is used to seed NumPy’s RNG, which then draws one integer seed per unit via
np.random.randint(1, max_integer_for_unit_outage_seeding, size=n_units_in_project). Each unit’s outage timeline is then simulated from its own seed.--max_integer_for_unit_outage_seedingdefaults to1000000.Hybrid-storage offset. For hybrid-storage projects, the storage component is simulated with a seed offset from the generator component’s unit seed by
--hybrid_storage_seed_increment(defaults to1000).
Every project / unit / iteration still draws its own independent random outage
timeline, but the whole simulation reproduces exactly when re-run with the same
seed settings. Again, these flags are ignored unless --user_provided_seeding
is set. Caution advised when seeding.
Usage
>>> gridpath_run_data_toolkit --single_step create_availability_iteration_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database table has been populated:
raw_data_unit_availability_params
This table can be populated ahead of time, or loaded at run time by passing
--outage_params_input_csv. Units that use the historical_year outage
model additionally read their derate series from the CSV passed via
--historical_availability_csv.
Settings
database
outage_params_input_csv
historical_availability_csv
stage_id
n_iterations
study_year
project_availability_scenario_id
project_availability_scenario_name
output_directory
overwrite
sort
print_ones
n_parallel_projects
user_provided_seeding
starting_project_iteration_seed
max_integer_for_unit_outage_seeding
hybrid_storage_seed_increment
Weather Derates (Sync)
Create GridPath sync weather iteration availability inputs.
Usage
>>> gridpath_run_data_toolkit --single_step create_sync_gen_weather_derate_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_availability_profiles
raw_data_unit_availability_params
Settings
database
output_directory
exogenous_availability_weather_scenario_id
exogenous_availability_weather_scenario_name
overwrite
n_parallel_projects
Weather Derates (Monte Carlo)
Create GridPath Monte Carlo weather iteration availability inputs. Before
running this module, you will need to create weather draws with the
create_monte_carlo_draws module (see Monte Carlo Weather Iteration Draws).
Usage
>>> gridpath_run_data_toolkit --single_step create_monte_carlo_weather_derate_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_unit_availability_params
raw_data_availability_profiles
aux_weather_iterations (see the
create_monte_carlo_drawsstep for how to create synthetic weather years and populate this table)
Settings
database
output_directory
exogenous_availability_weather_scenario_id
exogenous_availability_weather_scenario_name
overwrite
n_parallel_projects
weather_bins_id
weather_draws_id
Form EIA 860 Project Capacity
Create specified capacity CSV for a EIA860-based project portfolio.
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_specified_capacity_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
Settings
database
output_directory
study_year
region
project_specified_capacity_scenario_id
project_specified_capacity_scenario_name
Form EIA 860 Projects – Create CSV with Fixed Costs Set to Zero
Create fixed cost CSV for a EIA860-based project portfolio with fixed costs set to zero as fixed cost data are not available at this time. The CSV is necessary to create since fixed costs are currently a required GridPath input.
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_fixed_cost_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
Settings
database
output_directory
study_year
region
project_fixed_cost_scenario_id
project_fixed_cost_scenario_name
Form EIA 860 Projects User-Defined Operating Characteristics
Create opchar CSV for a EIA860-based project portfolio. Note that most of operating characteristics are user-defined in the user_defined_eia_gridpath_key table and will take default values until more detailed data are available.
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_opchar_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
Settings
database
output_directory
study_year
region
project_operational_chars_scenario_id
project_operational_chars_scenario_name
project_fuel_scenario_id
variable_generator_profile_scenario_id
hydro_operational_chars_scenario_id
Form EIA 860 Project Fuels
Create project fuels CSV for a EIA860-based project portfolio.
Note
Some fuel regions in the EIA AEO are more disaggragated than the BA in Form EIA 860 (e.g. CA South and North regions in the AEO, and CISO BA in Form EIA 860). This module currently can only assign one fuel region to each BA. If you need the extra resolution, you will need to modify it.
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_fuel_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
user_defined_baa_key
Settings
database
output_directory
study_year
region
project_fuel_scenario_id
project_fuel_scenario_name
Form EIA 860 Project Heat Rates (User-Defined by Tech)
Create project heat rate CSV for a EIA860-based project portfolio.
Note
Heat rates are user-specified and generic by technology. If you need more granular heat rates by, say, project, you would need to modify this module.
Note
The query in this module is consistent with the project selection
from eia860_to_project_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia860_to_project_heat_rate_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia860_generators
user_defined_eia_gridpath_key
user_defined_heat_rate_curve
Settings
database
output_directory
study_year
region
project_hr_scenario_id
project_hr_scenario_name
Variable Gen Profiles (Sync)
Create GridPath sync variable generation profile inputs.
Usage
>>> gridpath_run_data_toolkit --single_step create_sync_var_gen_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_var_profiles
raw_data_var_project_units
Settings
database
output_directory
variable_generator_profile_scenario_id
variable_generator_profile_scenario_name
overwrite
n_parallel_projects
Variable Gen Profiles (Monte Carlo)
Create GridPath Monte Carlo variable generation profile inputs. Before running
this module,you will need to create weather draws with the
create_monte_carlo_draws module (see Monte Carlo Weather Iteration Draws).
What this step does
This is the variable energy resource (VER) counterpart to the load-CSV step. It
reads the synthetic per-iteration variable generation profiles – assembled from
raw_data_var_profiles (the raw hourly unit-level cap_factor data) and
raw_data_var_project_units (the project-to-unit mapping and per-unit
weights), resampled according to the weather draws stored in
aux_weather_iterations – and writes them out as GridPath variable-generator
profile input CSVs in --output_directory, tagged with the given
--variable_generator_profile_scenario_id and
--variable_generator_profile_scenario_name. These CSVs are the files the
GridPath model consumes for variable generation.
Methodology
For each project, the per-unit cap_factor values from raw_data_var_profiles
are multiplied by their unit_weight and summed to produce a single
project-level cap_factor time series. The weather draws in
aux_weather_iterations (selected by --weather_bins_id and
--weather_draws_id) determine, for each Monte Carlo weather_iteration and
draw_number, which historical day’s data to pull, and the draw number is used
to compute the timepoint ID. One output CSV is written per project, named
{project}-{scenario_id}-{scenario_name}.csv, with an accompanying iterations
CSV written to an iterations subdirectory of --output_directory.
--n_parallel_projects N processes up to N projects concurrently (via a
multiprocessing pool over the project pool) to speed things up. --overwrite
deletes any existing CSVs with the matching project/scenario filename before
writing; without it, output is appended to existing files.
Usage
>>> gridpath_run_data_toolkit --single_step create_monte_carlo_var_gen_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_var_profiles
raw_data_var_project_units
aux_weather_iterations (see the
create_monte_carlo_drawsstep for how to create synthetic weather years and populate this table)
You must run create_monte_carlo_draw_profiles before running this module to populate the database with the raw data and the synthetic weather draws.
Settings
database
output_directory
variable_generator_profile_scenario_id
variable_generator_profile_scenario_name
overwrite
n_parallel_projects
weather_bins_id
weather_draws_id
Hydro Gen Inputs
Create hydro iteration input CSVs from year/month data.
Usage
>>> gridpath_run_data_toolkit --single_step create_hydro_iteration_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_project_hydro_opchars_by_year_month
raw_data_hydro_years
user_defined_balancing_type_horizons
Settings
database
output_directory
hydro_operational_chars_scenario_id
hydro_operational_chars_scenario_name
overwrite
n_parallel_projects
What this step does
This module builds GridPath hydro operational-characteristics input CSVs from
the year/month hydro data loaded earlier
(raw_data_project_hydro_opchars_by_year_month, raw_data_hydro_years,
and the user-defined balancing-type horizons in
user_defined_balancing_type_horizons). For each hydro iteration it derives
the per-horizon hydro operating parameters – the average, minimum, and maximum
power fractions – and writes them to --output_directory under the given
hydro_operational_chars_scenario_id and
hydro_operational_chars_scenario_name. --n_parallel_projects N runs up
to N projects at once, and --overwrite replaces existing CSVs.
Methodology
The distinct projects to process are read from
raw_data_project_hydro_opchars_by_year_month, and one CSV is written per
project, named <project>-<scenario_id>-<scenario_name>.csv in
--output_directory. Projects are processed in a multiprocessing pool sized
by --n_parallel_projects (defaults to 1).
Hydro iterations and balancing-type horizons
The set of hydro years is read from raw_data_hydro_years and each year is
treated as one hydro iteration (written into the hydro_iteration column).
The set of (balancing_type, horizon) pairs is read from
user_defined_balancing_type_horizons; if --hydro_balancing_type is
supplied, the pairs are filtered to that single balancing type (e.g. day,
week, month), otherwise all balancing types are included. For every
combination of hydro year and balancing-type horizon, one output row is
produced.
Deriving per-horizon power fractions
The average_power_fraction, min_power_fraction, and
max_power_fraction for each horizon are computed by month-weighting the raw
monthly opchar values. For a given balancing-type horizon, the module reads its
hour_ending_of_year_start and hour_ending_of_year_end from
user_defined_balancing_type_horizons and walks each hour of the year in that
range, mapping the hour to a calendar month (via a pandas.Timestamp anchored
at January 1 of the hydro year) and counting the number of hours that fall in
each month. These hour counts become the per-month weights for the horizon.
For each month touched by the horizon, the module looks up the project’s
average_power_fraction, min_power_fraction, and max_power_fraction
for that hydro year and month in
raw_data_project_hydro_opchars_by_year_month, multiplies each by the month’s
hour-count weight, sums across months, and divides by the total number of hours
in the horizon. The result is an hours-weighted average of the monthly
fractions for each of the three parameters, written as a single row keyed by
balancing_type_project and horizon (with weather_iteration set to
0, i.e. no weather iteration). Note we take the weighted averages of the
mins and maxes, not the mins of the mins or the maxes of the maxes.
Writing and overwriting output
Rows are appended to the project’s CSV as they are generated, with the header
written only when the file does not yet exist. When --overwrite is set, any
existing CSV for the project is deleted before processing begins so it is
rebuilt from scratch; without --overwrite, new rows are appended to any
existing file.
If the corresponding --*_input_csv paths are provided, the raw-data tables
(raw_data_project_hydro_opchars_by_year_month, raw_data_hydro_years,
user_defined_balancing_type_horizons) are loaded from those CSVs before the
inputs are built; otherwise the data is assumed to already be present in the
database.
6.2.7. Fuel Inputs
EIA AEO Fuel Chars (User-Defined)
Create GridPath fuel chars inputs (fuel_scenario_id) for fuels in the EIA AEO. The fuel characteristics are user-defined.
Usage
>>> gridpath_run_data_toolkit --single_step eiaaeo_to_fuel_chars_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
Thios module assumes the following raw input database tables have been populated:
raw_data_eiaaeo_fuel_prices
user_defined_eia_gridpath_key
user_defined_generic_fuel_intensities
user_defined_eiaaeo_region_key
Settings
database
output_directory
model_case
report_year
fuel_scenario_id
fuel_scenario_name
EIA AEO Fuel Prices
Create GridPath fuel price inputs (fuel_scenario_id) based on the EIA AEO.
Warning
The user is reponsible for ensuring that all prices and costs in their model are in a consistent real currency year.
Usage
>>> gridpath_run_data_toolkit --single_step eiaaeo_fuel_price_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
Thios module assumes the following raw input database tables have been populated:
raw_data_eiaaeo_fuel_prices
user_defined_eiaaeo_region_key
Settings
database
output_directory
model_case
report_year
fuel_price_id
6.2.8. Transmission Inputs
Form EIA 930 Transmission Portfolio
This module creates a transmission line portfolio input CSV for an EIA930-based transmission portfolio. The transmission capacity type is set “tx_spec” for all lines.
Usage
>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_portfolio_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia930_hourly_interchange
Settings
database
output_directory
region
transmission_portfolio_scenario_id
transmission_portfolio_scenario_name
Form EIA 930 Tranmission Load Zones
Create load zone input CSV for a EIA930-based transmission portfolio.
Note
The query in this module is consistent with the transmission selection
from eia930_to_transmission_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_load_zone_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia930_hourly_interchange
Settings
database
output_directory
region
transmission_load_zone_scenario_id
transmission_load_zone_scenario_name
Form EIA 930 Transmission Availability
Create availability type CSV for a EIA930-based project portfolio. Availability types are set to ‘exogenous’ for all transmission lines with no exogenous profiles specified (i.e., always available).
Note
The query in this module is consistent with the project selection
from eia930_to_transmission_portfolio_input_csvs.
Usage
>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_availability_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia930_hourly_interchange
user_defined_baa_key
Settings
database
output_directory
region
transmission_availability_scenario_id
transmission_availability_scenario_name
Form EIA 930 Transmission Capacity
Create specified capacity CSV for a EIA930-based transmission portfolio.
Note
The query in this module is consistent with the transmission selection
from eia930_to_transmission_portfolio_input_csvs.
Warning
Only minimal, manual data cleaning has been conducted on this dataset. More robust processing is required for usability past the demo stage.
Usage
>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_specified_capacity_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia930_hourly_interchange
Settings
database
output_directory
study_year
region
transmission_specified_capacity_scenario_id
transmission_specified_capacity_scenario_name
Form EIA 930 Transmission Opchar
This module creates transmission opchar input CSV for an EIA930-based transmission portfolio. The transmission operational type is set to “tx_simple” and the losses are set to 2% by default.
Usage
>>> gridpath_run_data_toolkit --single_step eia930_to_transmission_ochar_input_csvs --settings_csv PATH/TO/SETTINGS/CSV
Input prerequisites
- This module assumes the following raw input database tables have been populated:
raw_data_eia930_hourly_interchange
Settings
database
output_directory
tx_simple_loss_factor
region
transmission_operational_chars_scenario_id
transmission_operational_chars_scenario_name