Commit Graph

14 Commits

Author SHA1 Message Date
mhz
94fe13756f try to update reward func 2024-09-14 23:56:36 +02:00
mhz
2ac17caa3c need to update the model 2024-09-12 23:40:42 +02:00
mhz
0c60171c71 need to update the model 2024-09-10 16:57:42 +02:00
mhz
97fbdf91c7 try to deploy PPO policy 2024-09-09 23:50:10 +02:00
mhz
5dccf590e7 add sample phase and try to get log prob 2024-09-08 23:26:49 +02:00
mhz
0c4b597dd2 train phase done 2024-09-08 21:09:41 +02:00
mhz
11d9697e06 write some codes for integrate reward code 2024-09-08 20:28:14 +02:00
mhz
73324083ce update the gpu id 2024-07-03 15:25:46 +02:00
mhz
ba008ae54c update the main function 2024-07-01 10:02:51 +02:00
mhz
14186fa97f write test code 2024-06-26 23:41:37 +02:00
mhz
062a27b83f try update the api in DataInfo 2024-06-26 22:10:07 +02:00
mhz
82299e5213 try to run the graph, commented sampling codes 2024-06-25 00:09:27 +02:00
Hanzhang Ma
4f8945ca07 add somecomments 2024-06-08 21:35:35 +02:00
gang liu
2c00828630 update_name 2024-05-25 15:32:36 -04:00