Target networks stabilize bootstrapping in RL 🛡️
But induce slow-moving targets 🐢
Online networks adapt fast ⚡
But can diverge with function approximation 💥
𝗠𝗜𝗡𝗧𝗢 🌿 uses the online network 𝗼𝗻𝗹𝘆 𝗶𝗳 𝗶𝘁 𝗰𝗮𝗻 — yielding faster 𝘢𝘯𝘥 more stable RL.
Here’s how 👇
Target networks stabilize bootstrapping in RL 🛡️
But induce slow-moving targets 🐢
Online networks adapt fast ⚡
But can diverge with function approximation 💥
𝗠𝗜𝗡𝗧𝗢 🌿 uses the online network 𝗼𝗻𝗹𝘆 𝗶𝗳 𝗶𝘁 𝗰𝗮𝗻 — yielding faster 𝘢𝘯𝘥 more stable RL.
Here’s how 👇
Target networks stabilize bootstrapping in RL 🛡️
But induce slow-moving targets 🐢
Online networks adapt fast ⚡
But can diverge with function approximation 💥
𝗠𝗜𝗡𝗧𝗢 🌿 uses the online network 𝗼𝗻𝗹𝘆 𝗶𝗳 𝗶𝘁 𝗰𝗮𝗻 — yielding faster 𝘢𝘯𝘥 more stable RL.
Here’s how 👇