MSAI Project weekly update 20200203

Byron, 03 February 2020

Completion (2020/02/03 - 2020/02/16)

Ideas

  • Abount the projects: Both assignment of courses Multi-agent and AI introduction are related with reinforcement learning, i would like to combine these assignments into the first period of the RL projects

Questions

Basic

  • Q: What’s the core difference between value iteration and policy iteration in MDP?
  • A: Both of them are based on the dynamic programming
  • Q: Can pytorch be used for Java?
  • A: No
  • Q: on-policy traning and off-policy training
  • A: refer to Q-learning and SASAR

Project related

  • Q: Policy understanding of model-free DRL? Is it aim to address the pain point of sparse reward, huge search space.
  • A: Policy understanding of the DRL is trying to mapping/compress the state-action sequences into low dimensional vector. In an adversarial environment, the low dimensional representation of the adversarial agent (rule based) will generally help our learning agent converage faster during training. it could help our learning agent perform fast, dynamic and accurate response.
  • Q: Will the hieratical RL, imitation Learning help on this target?
  • A: No, they are different study field

Next Step