Update README.md
This commit is contained in:
parent
5955244f37
commit
e17ecd265d
@ -50,10 +50,6 @@ If you want to run the LLaVA prompt-image alignment experiments, you need to ded
|
||||
## Reward Curves
|
||||
<img src="https://github.com/kvablack/ddpo-pytorch/assets/12429600/593c9be3-e2a7-45d8-b1ae-ca4f77197c18" width="49%">
|
||||
<img src="https://github.com/kvablack/ddpo-pytorch/assets/12429600/d12fef0a-68b8-4cef-a9b8-cb1b6878fcec" width="49%">
|
||||
<img src="https://github.com/kvablack/ddpo-pytorch/assets/12429600/68c6a7ac-0c31-4de6-a7a0-1f9bb20202a4" width="49%">
|
||||
<img src="https://github.com/kvablack/ddpo-pytorch/assets/12429600/c1605879-a7e4-4d64-886b-eb71800b646a" width="49%">
|
||||
<img src="https://github.com/kvablack/ddpo-pytorch/assets/12429600/393a929e-36af-46f2-8022-33384bdae1c8" width="49%">
|
||||
|
||||
As you can see with the aesthetic experiment, if you run for long enough the algorithm eventually experiences instability. This might be remedied by decaying the learning rate. Interestingly, however, the actual qualitative samples you get after the instability are mostly fine -- the drop in the mean is caused by a few low-scoring outliers. This is clear from the full reward histogram, which you can see if you go to an individual run in wandb.
|
||||
|
||||
<img src="https://github.com/kvablack/ddpo-pytorch/assets/12429600/eda43bef-6363-45b5-829d-466502e0a0e3" width="50%">
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user