|
94fe13756f
|
try to update reward func
|
2024-09-14 23:56:36 +02:00 |
|
|
2ac17caa3c
|
need to update the model
|
2024-09-12 23:40:42 +02:00 |
|
|
0c60171c71
|
need to update the model
|
2024-09-10 16:57:42 +02:00 |
|
|
97fbdf91c7
|
try to deploy PPO policy
|
2024-09-09 23:50:10 +02:00 |
|
|
5dccf590e7
|
add sample phase and try to get log prob
|
2024-09-08 23:26:49 +02:00 |
|
|
0c4b597dd2
|
train phase done
|
2024-09-08 21:09:41 +02:00 |
|
|
11d9697e06
|
write some codes for integrate reward code
|
2024-09-08 20:28:14 +02:00 |
|
|
73324083ce
|
update the gpu id
|
2024-07-03 15:25:46 +02:00 |
|
|
ba008ae54c
|
update the main function
|
2024-07-01 10:02:51 +02:00 |
|
|
14186fa97f
|
write test code
|
2024-06-26 23:41:37 +02:00 |
|
|
062a27b83f
|
try update the api in DataInfo
|
2024-06-26 22:10:07 +02:00 |
|
|
82299e5213
|
try to run the graph, commented sampling codes
|
2024-06-25 00:09:27 +02:00 |
|
Hanzhang Ma
|
4f8945ca07
|
add somecomments
|
2024-06-08 21:35:35 +02:00 |
|
gang liu
|
2c00828630
|
update_name
|
2024-05-25 15:32:36 -04:00 |
|