The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.
More at length here:
gist.github.com/yoavg/3eb3e7...
The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.
More at length here:
gist.github.com/yoavg/3eb3e7...
A) people can be thought of as agents who observe and environment, act, observe the outcome and update their beliefs
to:
B) lets model all things as a POMDP with a numeric reward function!
is just way too big for me
A) people can be thought of as agents who observe and environment, act, observe the outcome and update their beliefs
to:
B) lets model all things as a POMDP with a numeric reward function!
is just way too big for me
but assuming we do delete facts, is deleting considered learning in your definition?
but assuming we do delete facts, is deleting considered learning in your definition?