https://github.com/mdeib/berkeley-deep-RL-pytorch-solutions/blob/47da61101d144e14926975f3732af7ac020382b3/hw2/cs285/policies/MLP_policy.py#L115 Shall the loss be averaged by N? I apologize if I am wrong. Do not have much experience with RL. Thanks.