American Nuclear Society
Home

Home / Publications / Journals / Nuclear Technology / Volume 212 / Number 2

A Reinforcement Learning Approach to Augment Conventional PID Control in Nuclear Power Plant Transient Operation

Aidan Rigby, Mike Wagner, Daniel Mikkelson, Ben Lindley

Nuclear Technology / Volume 212 / Number 2 / February 2026 / Pages 427-445

Regular Research Article / dx.doi.org/10.1080/00295450.2025.2466137

Received:September 17, 2024
Accepted:February 6, 2025
Published:February 6, 2026

The ability of nuclear reactors to operate their power conversion cycles more flexibly will enhance their value to energy grids with variable pricing. Current nuclear control systems are typically classical controllers that are often based on proportional-integral-derivative (PID) control. This paper presents a method of augmenting the existing PID control for difficult transient operations in nuclear power plants using a reinforcement learning–derived feedforward signal applied in real time. The agents, which are trained on a test thermal load-following problem, are designed to improve steam generator outlet temperature control for a range of fast load-following scenarios covering ramp rates from 9%/min to 15%/min.

Several reinforcement learning algorithms were initially investigated for the training of the feedforward agents with deep Q-learning (DQN) and proximal policy optimization (PPO) networks, which were found to be the most promising. The DQN controllers utilize discrete actions, giving them a better disturbance rejection at steady state but inconsistent response to initial temperature deviations. In contrast, PPO-trained agents, which take continuous actions except for a dead zone around zero, were shown to have the best combination of high disturbance rejection at steady state and good tracking of the desired temperature value. The ability of the PPO agent was also examined, with the average time of decision making found to be on the order of 1 ms.

The fault properties of the controller under the loss of the reinforcement learning agent feedforward signal were also examined. The controller showed strong performance in situations of “no-signal” faults. but was less good at handling “stuck-at” faults, where the feedforward signal remains at a set value. In both cases, however, the PID was able to successfully maintain stability, eventually returning the system to a steady state. It is hoped that this work will allow for the proposed control architecture to be examined for more difficult control problems such that it may eventually be used to adapt existing nuclear plants for more aggressive load-following on grids of the future.