UCB proof is using an assumption that is never justified.

26 Views Asked by Bumbble Comm At 08 Apr 2026 - 2:52

I'm walking though deep mind's course on reinforcement learning: https://www.youtube.com/watch?v=aQJP3Z2Ho8U&t=4940s

at timestamp 1:14:40, the lecturer states that we assume that we have a time step "m", <= t (the current time step), for which:

$N_m(a)\Delta_a \leq x_alog(m)$

where:

$N_m(a)$ is the number of times we picked action a.
$\Delta_a$ is the expected regret for picking action a.
$x_a$ is some constant we did not determine yet.

the rest of the proof seem to rely on this assumption. but, the lecturer never stops to justify it! what if you don't have this kind of a time step? the whole proof becomes useless, isn't it?

Original Q&A

UCB proof is using an assumption that is never justified.

Related Questions in SOLUTION-VERIFICATION

Related Questions in PROOF-WRITING

Related Questions in UPPER-LOWER-BOUNDS

Trending Questions

Popular # Hahtags

Popular Questions