Commit Graph

17 Commits

Author SHA1 Message Date
mhz
91d4e3c7ad try to get the original perf 2024-09-16 22:45:12 +02:00
mhz
c867aef5a6 now we add reward wait to test 2024-09-15 22:21:09 +02:00
mhz
1ad520d248 can run but need to test whtich pth is 2024-09-15 22:18:56 +02:00
mhz
94fe13756f try to update reward func 2024-09-14 23:56:36 +02:00
mhz
2ac17caa3c need to update the model 2024-09-12 23:40:42 +02:00
mhz
0c60171c71 need to update the model 2024-09-10 16:57:42 +02:00
mhz
97fbdf91c7 try to deploy PPO policy 2024-09-09 23:50:10 +02:00
mhz
5dccf590e7 add sample phase and try to get log prob 2024-09-08 23:26:49 +02:00
mhz
0c4b597dd2 train phase done 2024-09-08 21:09:41 +02:00
mhz
11d9697e06 write some codes for integrate reward code 2024-09-08 20:28:14 +02:00
mhz
73324083ce update the gpu id 2024-07-03 15:25:46 +02:00
mhz
ba008ae54c update the main function 2024-07-01 10:02:51 +02:00
mhz
14186fa97f write test code 2024-06-26 23:41:37 +02:00
mhz
062a27b83f try update the api in DataInfo 2024-06-26 22:10:07 +02:00
mhz
82299e5213 try to run the graph, commented sampling codes 2024-06-25 00:09:27 +02:00
Hanzhang Ma
4f8945ca07 add somecomments 2024-06-08 21:35:35 +02:00
gang liu
2c00828630 update_name 2024-05-25 15:32:36 -04:00