MSAI Project weekly update 20200203
Byron, 03 February 2020
Completion (2020/02/03 - 2020/02/16)
- Completed the Deep Learning and NLP first assignment (The first language model). At the same time, I set up the environment, recap the usage of numpy, pandas and get hands dirty on the Pytorch
- Learnt the RL online courses (David Silver) 1-3
- Familiar with Markdown
- Paper read:
Ideas
- Abount the projects: Both assignment of courses Multi-agent and AI introduction are related with reinforcement learning, i would like to combine these assignments into the first period of the RL projects
Questions
Basic
- Q: What’s the core difference between value iteration and policy iteration in MDP?
- A: Both of them are based on the dynamic programming
- Q: Can pytorch be used for Java?
- A: No
- Q: on-policy traning and off-policy training
- A: refer to Q-learning and SASAR
Project related
- Q: Policy understanding of model-free DRL? Is it aim to address the pain point of sparse reward, huge search space.
- A: Policy understanding of the DRL is trying to mapping/compress the state-action sequences into low dimensional vector. In an adversarial environment, the low dimensional representation of the adversarial agent (rule based) will generally help our learning agent converage faster during training. it could help our learning agent perform fast, dynamic and accurate response.
- Q: Will the hieratical RL, imitation Learning help on this target?
- A: No, they are different study field
Next Step
- Complete the Multi-agent Assigment literature review(due by 2020/02/23)
- Go through the paper, blog and related materials mentioned by Dr. Zheng