Stochastic Optimization Methods for Policy Evaluation in Reinforcement Learning

Yi Zhou
og Shaocong Ma

Bog

Format
Bog, paperback
Engelsk
60 sider

Indgår i serie
Foundations and Trends R in Optimization

Normalpris: kr. 564,95

Medlemspris: kr. 499,95 For at købe bogen til medlemspris skal du have et medlemskab med Shopping-fordele. Du kan prøve medlemskabet gratis i 7 dage. Medlemskabet fornyes automatisk og kan altid opsiges.

Leveringstid: 7-9 Hverdage (Sendes fra fjernlager)
Forventet levering: 18-11-2024
Kan pakkes ind og sendes som gave
Split betalingen op med

Beskrivelse

This monograph introduces various value-based approaches for solving the policy evaluation problem in the online reinforcement learning (RL) scenario, which aims to learn the value function associated with a specific policy under a single Markov decision process (MDP). Approaches vary depending on whether they are implemented in an on-policy or off-policy manner. In on-policy settings, where the evaluation of the policy is conducted using data generated from the same policy that is being assessed, classical techniques such as TD(0), TD(¿), and their extensions with function approximation or variance reduction are employed in this setting. For off-policy evaluation, where samples are collected under a different behavior policy, this monograph introduces gradient-based two-timescale algorithms like GTD2, TDC, and variance-reduced TDC. These algorithms are designed to minimize the mean-squared projected Bellman error (MSPBE) as the objective function. This monograph also discusses their finite-sample convergence upper bounds and sample complexity.

Læs hele beskrivelsen

Detaljer

SprogEngelsk
Sidetal60
Udgivelsesdato15-08-2024
ISBN139781638283706
Forlag Now Publishers Inc
FormatPaperback

Størrelse og vægt

Vægt108 g

Dybde0,4 cm

10 cm

15,6 cm

23,4 cm

Stochastic Optimization Methods for Policy Evaluation in Reinforcement Learning

Findes i disse kategorier...