Eligibility Trace

An eligibility trace is a tool utilized in reinforcement learning to assist with the challenge of credit assignment. Credit assignment is the task of determining which past actions should receive credit or blame for a current outcome. Eligibility traces help to solve this problem by storing recent actions that contribute to the outcome.

Memory Vector

An eligibility trace is represented as a memory vector $\textbf{z}\_{t}$ that is parallel to the long-term weight vector $\textbf{w}\_{t}$. The weight vector represents the policy that the reinforcement learning agent is using to make decisions.

When a component of $\textbf{w}\_{t}$ participates in generating an estimated value, the corresponding component of $\textbf{z}\_{t}$ is incremented. This process is repeated for each action that contributes to the outcome. The trace decay parameter $\lambda$ determines the rate at which the trace falls back to zero.

This memory vector helps to capture two important heuristics necessary for credit assignment: a frequency heuristic and a recency heuristic. The frequency heuristic states that states that are visited more often deserve more credit. In contrast, the recency heuristic suggests that recent actions should receive more credit.

Formula

The eligibility trace is updated using the following formula:

$$E\_{0}\left(s\right) = 0 $$ $$E\_{t}\left(s\right) = \gamma\lambda{E}\_{t-1}\left(s\right) + \textbf{1}\left(S\_{t} = s\right) $$

Here, $E\_{t}\left(s\right)$ refers to the eligibility of state $s$ at time $t$, and $\gamma$ represents the discount factor (i.e., how much future rewards are valued compared to immediate rewards).

Learning with Eligibility Traces

The use of eligibility traces allows the reinforcement learning agent to more effectively learn from past experiences. As the agent navigates an environment, it stores recent actions that contributed to the current outcome. When a new outcome occurs, it can use the eligibility traces to determine which actions should receive credit or blame. This information can then be used to adjust the policies used by the agent to make decisions. Over time, the agent's policies improve, leading to more successful outcomes in the environment.

Overall, eligibility traces are an important tool in the field of reinforcement learning. By storing information about past actions and their contributions to outcomes, they can help agents learn more effectively and make better decisions.