MSAI Project weekly update 20200203

Byron, 03 February 2020

Completed the Deep Learning and NLP first assignment (The first language model). At the same time, I set up the environment, recap the usage of numpy, pandas and get hands dirty on the Pytorch
Learnt the RL online courses (David Silver) 1-3
Familiar with Markdown
Paper read:

Abount the projects: Both assignment of courses Multi-agent and AI introduction are related with reinforcement learning, i would like to combine these assignments into the first period of the RL projects

Basic

Q: What’s the core difference between value iteration and policy iteration in MDP?
A: Both of them are based on the dynamic programming
Q: Can pytorch be used for Java?
A: No
Q: on-policy traning and off-policy training
A: refer to Q-learning and SASAR

Project related

Q: Policy understanding of model-free DRL? Is it aim to address the pain point of sparse reward, huge search space.
A: Policy understanding of the DRL is trying to mapping/compress the state-action sequences into low dimensional vector. In an adversarial environment, the low dimensional representation of the adversarial agent (rule based) will generally help our learning agent converage faster during training. it could help our learning agent perform fast, dynamic and accurate response.
Q: Will the hieratical RL, imitation Learning help on this target?
A: No, they are different study field