A Gentle Introduction to Deep Reinforcement Learning in JAX | by Ryan Pégoud

Implementing the update function for DQN is slightly more complex, let’s break it down:

First, the _loss_fn function implements the squared error described previously for a single experience.
Then, _batch_loss_fn acts as a wrapper for _loss_fn and decorates it with vmap, applying the loss function to a batch of experiences. We then return the average error for this batch.
Finally, update acts as a final layer to our loss function, computing its gradient with respect to the online network parameters, the target network parameters, and a batch of experiences. We then use Optax (a JAX library commonly used for optimization) to perform an optimizer step and update the online parameters.

A Gentle Introduction to Deep Reinforcement Learning in JAX | by Ryan Pégoud | Nov, 2023