Goal Conditioned RL Framework

_{PandaReach-v3}	_PandaPush-v3
_{PandaSlide-v3}	_{PandaPickAndPlace-v3}

Overview

This project implements a generalized reinforcement learning framework deployed to control a Franka robot arm via end-effector manipulation using various RL algorithms. The framework supports TD3 (Twin Delayed Deep Deterministic Policy Gradient), SAC (Soft Actor-Critic), TQC (Truncated Quantile Critics), and DDPG (Deep Deterministic Policy Gradient). In addition to the framework, wandb integration is enabled for logging metrics, including Q-values, gradients, losses, rolling average reward and success rate, and td error.

While all agents are supported, DDPG is the primary focus for training and deployment. This decision is based on ablation studies showing that SAC and TD3 require the removal of clipped Q-values to match DDPG's performance on manipulation tasks (see page 8). Nevertheless, the framework is flexible—agent types can be easily customized for training in other environments where SAC or TD3 may be more effective.

Weights for DDPG are available under resources/DDPG for testing/evaluation.

Installation

Clone the repository:

git clone https://github.com/CodeKnight314/Goal-Conditioned-RL-Framework.git

Install necessary python packagaes via pip:

cd Goal-Conditioned-RL-Framework/
pip install -r requirements.txt

Training

To train an agent, run the main.py script with train mode. You need to specify the agent type, environment ID, configuration file, and an output directory.

For example, to train a DDPG agent on the PandaReach-v3 task:

python src/main.py \
    --mode train \
    --agent DDPG \
    --id reach \
    --c src/config/DDPG/config_ddpg_reach.yaml \
    --o resources/DDPG/reach

--agent: Specify the agent (DDPG, SAC, TD3, TQC).
--id: Specify the task (reach, push, slide, pickplace).
--c: Path to the agent and task-specific configuration file.
--o: Path to the directory where trained models and logs will be saved.

Evaluation

To evaluate a trained agent, run the main.py script with test mode. You need to provide the path to the trained model weights.

For example, to evaluate the DDPG agent trained on PandaReach-v3:

python src/main.py \
    --mode test \
    --agent DDPG \
    --id reach \
    --c src/config/DDPG/config_ddpg_reach.yaml \
    --o resources/DDPG/reach \
    --w resources/DDPG/reach \
    --neps 20 \
    --verbose

--w: Path to the directory containing the trained actor.pth and critic.pth files.
--neps: Number of episodes to run for evaluation.
--verbose: Use this flag to render the environment and see the agent in action.

Hyperparameter Search

The project uses Optuna for hyperparameter optimization. To start a search, run the param_search.py script.

For example, to search for optimal hyperparameters for a DDPG agent on the PandaReach-v3 task:

python src/param_search.py \
    --agent DDPG \
    --env reach \
    --trials 100 \
    --study-name "ddpg_reach_search" \
    --storage "sqlite:///param_search/DDPG_reach/optuna_study.db"

--agent: The agent to optimize.
--env: The environment for the task.
--trials: The number of trials to run.
--study-name: A name for the Optuna study.
--storage: The database URL for storing study results.

A Note on Transferability

This framework is designed to be transferable to custom environments that utilize a dictionary-based observation space. For seamless integration, your environment's step() function must return an observation dictionary containing the following keys:

observation: The primary observation from the environment.
achieved_goal: The goal the agent has currently achieved.
desired_goal: The target goal for the agent.

If your environment adheres to this structure, you can easily adapt it for use with this framework. To use a new environment, you can either:

Modify the HER_MAPPING dictionary in src/main.py to include your custom environment's ID.
Remove the mapping logic from src/main.py and pass your environment's string ID directly to the --id argument.

This flexibility allows the agent and training pipeline to be applied to a wide range of goal-oriented robotics tasks with minimal code changes.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
resources		resources
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Goal Conditioned RL Framework

Overview

Installation

Training

Evaluation

Hyperparameter Search

A Note on Transferability

About

Uh oh!

Releases

Packages

Languages

CodeKnight314/Goal-Conditioned-RL-Framework

Folders and files

Latest commit

History

Repository files navigation

Goal Conditioned RL Framework

Overview

Installation

Training

Evaluation

Hyperparameter Search

A Note on Transferability

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages