Energy Analytics Project - Part 1 : Back Story

Energy Analytics Project CAISO Fuel Prices

Back Story

I took this summer (2019) off to address the climate crisis. Somehow, there must be a way for software engineers, like myself, to affect climate change… and what better avenue than the energy sector?

For starters, I attended a couple of energy conferences. That’s where I learned about the price of electricity going negative in California. Apparently, solar electricity generation peaks a few hours before consumer demand for cooling peaks. It’s also where I learned about our Pacific Northwest hydro electricity prices going negative during certain flood events. Apparently, during flood events that coincide with salmon runs, the dams run excess water through the turbines to protect the fish. However, because this energy was not asked for and is just dumped onto the grid, this causes the price of electricity to go negative. I also learned that there are significant amounts of wind energy in Idaho and Montana… (Unfortunately, these presentations were anecdotal and I do not have the data to back up any of these assertions. Not quite.)

So the question is, how can we integrate and optimize the energy sector for renewable energy (and most importantly, decarbonizing the energy sector)?

About this time a friend told me about the data available on the CAISO OASIS site, and something clicked. You see, this site exposes an enormous amount of energy related data for the state of California (in fact, CAISO stands for California Independent Service Operator). If someone could consolidate data feeds in the energy sector, across the western region of CA, OR, WA, ID, MT, then perhaps we could analyze that data from the perspective of decarbonization? How can we most effectively use renewable and non-renewable energy sources?

To answer these questions, I created the Energy Analytics Project (EAP) and it is currently pulling down over 90 separate data feeds from CAISO OASIS, totalling about 1.5 million time-series data points.

In this blog post I’m going to introduce this project, describe why I think it’s interesting, and why I think you should join me and help out.

So, read on to find out more about this project…

Primary Objective

Decarbonize the energy sector.

I’ve written a background piece on why I started this project here.

That should give you a general grounding for project intent, namely, let’s get enough data to make informed decisions about our investments in the energy sector.


Create a community of developers, researchers, data scientists, and policy advocates, each leveraging the skills of the others.


Developers, like me, help by providing the infrastructure and tooling to get data into one place, in an accessible platform. The platform I chose to start with is:

  • source code in github
  • data in S3 buckets
  • databases in Sqlite3
  • reports in Jupyter Notebooks
  • distributed via github

My hope is that researchers and data scientists will start consuming this data and start generating reports. Then, we need policy advocates to start using these reports to affect the decisions being made at Federal, Regional, State and local levels.


Provide adequate tooling for managing all this data. I’ll defer further discussion of the tooling to a later post as that’s the rabbit hole that I am always drawn to.


Data scientists use a common set of tools, so this project should accommodate that. To that end, the tooling is based on Python3, Jupyter Notebooks, and Pandas. However, you can use whatever tooling you want, write your own data pipeline, whatever, it doesn’t matter. One thing that may be unusual about this project, however, is that all the data is surfaced as Sqlite3 databases. The primary artifact of each data feed is a Sqlite3 database.

Why is this interesting?

For me, it’s intrinsically interesting to get such detailed information about the worlds 5th largest economy and how it’s energy sector functions. But what’s more interesting is taking the CAISO model and evangelizing this type of data transparency to other agencies in the West. If we can get some small subset of the OASIS data feeds from entities like Bonneville Power Administration (BPA), and the western states such as Oregon, Washington, Idaho and Montana… then we can start to understand the potential costs and benefits of different types of energy resources, pricing strategies, etc. at a regional scale.

As the need to de-carbonize the energy sector becomes more and more apparent, we will need to make important decisions about where to invest our time and money. Do we build nuclear reactors, and if so why and where? What is the time-cost and ROI for this? Should be invest in solar, wind, or hydro? Are these systems built in a way that is complementary, e.g. are we using our hydro for capacitive storage? What is the cost for each of these solutions? How much of the demand can be shifted by live pricing?

There are a lot of questions to be answered here, and we need this data to be easily and publicly accessible so that we, any and all of us, can get started with trying to make sense out of the situation.

How can you get involved?

Please join the Energy Analytics Project mailing list.

See Energy Analytics Project - Part 2 : How To for instructions on how to get started with this platform.