This is the repository for the TAST-STEM 2025 Summer Camp project. The project aims to study perception-driven robotic grasping, where a robot first needs to recognize objects and then plans its motion for grasping. Our website could be found here.
- Object 6D Pose Estimation: FoundationPose
- Object Detection and Segmentation: NIDS-Net
- Fetch Robot ROS Components for IRVL Lab: fetch_ros_IRVL
- Sim & Real Grasping Scene Generation & Implementation: SceneReplica
- TAST-STEM 2025 Summer Camp Project
- Main Referred Methods
- Contents
- Prerequisites
- Environment Setup (Docker)
- Objects used in the Project
- Example Usage
- Project Schedule
- Week 1: Basic Knowledge Preparation
- Week 2: Hands-on Practice for ROS, 6D Pose Estimation, and Fetch Gazebo
- Week 3: Fetch Gazebo Simulation
- Week 4: Integrate FoundationPose, NIDS-Net with Fetch Gazebo Simulation
- Week 5: Run Grasping on Real Fetch Robot
- Linux/Unix:
sudo apt-get install git
-
Windows:
- Option One: Github Desktop.
- Option Two: Git for Windows.
-
MacOS:
- Option One: Github Desktop.
- Option Two: Homebrew.
Please refer to the official instruction Installing Miniconda to install the miniconda.
- You could install the Visual Studio Code (VSCode) from the official website.
- Recommended extensions for Python development in VSCode:
# Clone the repository and enter the project directory
git clone https://github.com/JWRoboticsVision/CAST-STEM-2025.git && cd CAST-STEM-2025
Follow below instructions to build the Docker image based on your operating system.
- For Linux, follow the Linux Installation Guide.
- For Windows, follow the Windows Installation Guide.
- Run the ros1-base container
bash ./docker/container_handler.sh run
- Enter the container
bash ./docker/container_handler.sh enter
- Make sure you are not in the conda environment
conda deactivate
- Update rosdeps
rosdep update --rosdistro=$ROS_DISTRO
- Download the ROS packages and compile the workspace
# Go to the catkin workspace
mkdir -p ~/catkin_ws/src && cd ~/catkin_ws/src
# Fetch Robot ROS package
git clone -b ros1 https://github.com/IRVLUTD/fetch_ros_IRVL.git
# Fetch Gazebo Simulator
git clone -b gazebo11 https://github.com/ZebraDevs/fetch_gazebo.git
# Clone the urdf_tutorial
git clone -b ros1 https://github.com/ros/urdf_tutorial.git
# Compile the workspace
cd ~/catkin_ws && catkin_make -j$(nproc) -DPYTHON_EXECUTABLE=/usr/bin/python3
-
Replace below files to make the Gazebo simulation load the Fetch robot URDF model correctly from the
fetch_ros_IRVL
package.fetch.gazebo.xacro
forfetch_gazebo
package
cp ~/code/docker/config/fetch.gazebo.xacro ~/catkin_ws/src/fetch_gazebo/fetch_gazebo/robots/
The following commands will create a conda environment in the ~/code/.env
directory and activate it. This is useful for keeping the environment isolated and organized within the project directory.
# Go to the code directory
cd ~/code
# Create the conda environment
conda create --prefix $PWD/.env python=3.11 libffi=3.4 pyside2=5.15 -y
# Activate the conda environment
conda activate $PWD/.env
- Install the PyTorch 2.1.1
python -m pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu118 --no-cache-dir
- Install the dependencies
python -m pip install -r requirements.txt --no-cache-dir
- Install the fetch_grasp package
python -m pip install -e source/fetch_grasp --no-cache-dir
- Download SAM2 checkpoint:
wget https://github.com/ultralytics/assets/releases/download/v8.3.0/sam2.1_t.pt -O ./checkpoints/sam2.1_t.pt
Download the FoundationPose.zip
file from box and extract it to the ./third-party
directory.
bash scripts/install_foundationpose.sh
Download the NIDS-Net.zip
file from box and extract it to the ./third-party
directory.
bash scripts/install_nidsnet.sh
Download the zip files from box and extract them to the ~/code/datasets
directory. The datasets directory should look like this:
./datasets
├── final_scenes
├── grasp_data
├── models
├── pose_data
To load the models properly in Gazebo, we need to create a symbolic link for the models directory in the Gazebo model path. This allows Gazebo to find the models when launching the simulation.
# Go to the .gazebo directory
cd ~/.gazebo
# Remove the existing symbolic link (if exists)
rm -rf models
# Create a new symbolic link to the models directory in the datasets
ln -s ~/code/datasets/models
To make the grasping scene easier, we will use a subset of the YCB objects for the tabletop scene.
Object Models | Grasping Data |
---|---|
![]() |
![]() |
- Run the Docker container:
bash docker/container_handler.sh run
- Enter the Docker container shell:
bash docker/container_handler.sh enter
- Stop the Docker container:
bash docker/container_handler.sh stop
The demo script will load the color image from ./demo/ros/color_image.png
and the user could segment the image using SAM2. The segmentation results will be saved under ./demo/ros
:
mask_image_sam2.png
: the segmentation mask image.mask_image_sam2_vis.png
: the visualization of the segmentation mask.sam2_prompts.yaml
: the prompts used for segmentation.
python ~/code/tools/test_sam2_segmentation.py
Ctrl + Left Click
: prompt positive points.Ctrl + Right Click
: prompt negative points.Esc
: exit the segmentation toolkit.Add Mask
: add current segmentation mask with shown label.Remove Mask
: remove last added mask.Save Mask
: save the current segmentation mask to./demo/ros/mask_image.png
.
The demo script will load the color image from ./demo/ros/color_image.png
and segment the objects in the image using NIDS-Net. The segmentation results will be saved under ./demo/ros/
:
mask_image_nidsnet.png
: the segmentation mask image.mask_image_nidsnet_vis.png
: the visualization of the segmentation mask.nidsnet_class_names.yaml
: the mapping of labels to class names in the segmentation mask.
python ~/code/tools/test_nidsnet.py
Color Image | Segmentation Mask Visual |
---|---|
![]() |
![]() |
The demo script will load the inputs from ./demo/ros/
and estimate the 6D pose of target object 035_power_drill
using FoundationPose. Results will be saved under ./demo/ros/
:
ob_in_cam_vis.png
: the rendered pose of the object in the camera frame.ob_in_cam.txt
: the estimated 6D pose of the object in the camera frame.
python ~/code/tools/test_foundationpose.py
Segmentation Mask | Rendered Pose |
---|---|
![]() |
![]() |
The demo script will load the NIDS-Net segmentation results from ./demo/ros/
and estimate the 6D pose of labeled objects in the mask. Results will be saved under ./demo/ros/
:
ob_in_cam_poses_vis.png
: the rendered poses of the objects in the camera frame.ob_in_cam_poses.npz
: the estimated 6D poses of the objects in the camera frame.
python ~/code/tools/test_nidsnet_and_fdpose.py
Segmentation Mask | Rendered Pose |
---|---|
![]() |
![]() |
- Terminal 1: Start the ROS master (if not already running)
roscore
- Terminal 2: Start the Fetch Gazebo simulation
roslaunch ~/code/config/launch/just_robot.launch
- Terminal 3: Start the MoveIt Planning Interface
roslaunch ~/code/config/launch/moveit_sim.launch
- Terminal 4: Start Rviz
rviz -d ~/code/config/rviz/grasp_sim.rviz
python ~/code/tools/fetch_grasp_sim.py --pose_method gazebo
The fetch_grasp_sim.py
script will execute below tasks:
- Lift the Fetch robot's torso and adjust the camera to look at the tabletop.
- Create the tabletop scene in Gazebo with randomly placed YCB Cracker object.
- Setup the MoveIt Scene and do motion planning for each object.
- Once a
FULL grasp
is found, the Fetch robot will execute the grasping motion in below order:- Reching to the target object.
- Grasp the object with the gripper.
- Move the gripper up to lift the object.
- Open the gripper to drop the object.
- Move the gripper back to the default position.
The fetch_grasp_sim.py
script will estimate the 6D poses of the objects using NIDS-Net and FoundationPose instead of the Gazebo simulation ground truth poses.
python ~/code/tools/fetch_grasp_sim.py --pose_method fdpose
The test_image_listenser.py
script will execute below tasks:
- Subscribe to the color and depth images published by the Fetch Gazebo simulation.
- Segment the objects in the color image using NIDS-Net.
- Estimate the 6D poses of the objects using FoundationPose.
- Publish the rendered poses and segmentation visuals via ROS topics.
# Place objects on the table in Gazebo simulation
python ~/code/tools/create_scene_sim.py
# Run the Image_Listener
python ~/code/tools/test_image_listenser.py
- Pythion_Basics.ipynb Introduce basics in Python, such as list, tuple, set, dictionary, class, function, loop, etc.
- Numpy_Basics.ipynb Introduce basics in Numpy, such as array, matrix, operation, etc.
- Pytorch_Basics.ipynb Introduce basics in Pytorch, such as tensor, operation, etc.
- Computer_Vision_Basics.pdf
- Practice 1: CV_Transformation.ipynb How to apply the transformation on 3D points.
- Practice 2: CV_Deprojection.ipynb How to depreject the 2D depth image to 3D points.
- Introduction_to_ROS.pdf Introduce the basic concepts and useful commands in ROS.
- Introduction_to_6D_Pose_Estimation.pdf Introduce the basic concepts of 6D pose estimation and FoundationPose.
- FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
- SAM: Segment Anything
- SAM 2: Segment Anything in Images and Videos
- NIDS-Net: Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
- DINOv2: Learning Robust Visual Features without Supervision
- Python basics https://pythonbasics.org/
- Numpy https://numpy.org/doc/stable/user/basics.html
- OpenCV https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html
- 06_ROS_Publish_Image.ipynb Introduce how to publish images in ROS.
- 07_ROS_Subscribe_Image.ipynb Introduce how to subscribe images in ROS.
- Write a ROS node to publish images from the recordings.
- Write a ROS node to subscribe images and display them in a window.
- Write a ROS node to subscribe images published by the Fetch Gazebo simulation.
- Modify the FoundationPose code to:
- detect object poses using the images published by the Fetch Gazebo simulation.
- publish the detected poses to a ROS topic.
- draw the detected poses on the images and publish the images to a ROS topic.
- ROS Tutorials
- ROS Python API
- ROS Messages
- MoveIt 1 Tutorials for ROS Noetic
In this week, we will focus on the Fetch Gazebo simulation. The goal is to use the Fetch robot in the Gazebo simulation environment to perform grasping tasks.
Download and unzip the my_demo.zip
file from box and place it under the ./docker/ros/catkin_ws/src
directory.
- Run Docker container:
bash ./docker/container_handler.sh run ros1-user
- Enter the container:
bash ./docker/container_handler.sh enter ros1-user
- Compile the ROS workspace:
# Go to the catkin workspace
cd ~/catkin_ws && catkin_make -j$(nproc) -DPYTHON_EXECUTABLE=/usr/bin/python3
# Source the workspace
source ~/catkin_ws/devel/setup.zsh
- Link the models of my_demo to the Gazebo model path:
To load the models properly in Gazebo, we need to link the my_demo
models to the Gazebo model path. This allows Gazebo to find the models when launching the simulation.
cd ~/.gazebo && ln -s ~/catkin_ws/src/my_demo/models models
- Terminal 1: Start the ROS master
roscore
- Terminal 2: Start the Gazebo Simulation
Below launch file will start the Gazebo simulation environment with a simple scene containing a Fetch robot, a table and the YCB Cracker object.
roslaunch my_demo table_ycb.launch
- Terminal 3: Launch the MoveIt Planning Interface
roslaunch fetch_moveit_config move_group.launch
- Terminal 4: Start Rviz
The fetch_gazebo.rviz
will display the color and depth images from the Fetch robot's cameras, the robot model, and the MoveIt PlanningScene.
rviz -d ${HOME}/code/config/rviz/fetch_gazebo.rviz
- Terminal 5: Run the Grasping
The grasp_cracker.py
script will execute below tasks:
- Lift the Fetch robot's torso.
- Setup the PlanningScene
- Adjust the Fetch robot's camera to look at the tabletop.
- Get the cracker 6D pose directly from the Gazebo simulation.
- Load the grasp data for the cracker object.
- Plan the grasping motion using MoveIt.
- First sort the grasps based on the distance to the gripper.
- Then plan the grasping motion for each grasp, until a valid grasp is found.
- Execute the grasping motion.
cd ~/catkin_ws/src/my_demo/scripts && python grasp_cracker.py
In this section, we will reimplement the SceneReplica benchmarking using the Fetch robot in the Gazebo simulation environment.
- Download the modified SceneReplica from box.
- Unzip the downloaded file and place it in the
third-party
directory. - Data Setup: Follow the instructions in the SceneReplica Data Setup to set up the data for SceneReplica.
Datasets
|--benchmarking
|--models/
|--grasp_data
|--refined_grasps
|-- fetch_gripper-{object_name}.json
|--sgrasps.pk
|--final_scenes
|--scene_data/
|-- scene_id_*.pk scene pickle files
|--metadata/
|-- meta-00*.mat metadata .mat files
|-- color-00*.png color images for scene
|-- depth-00*.png depth images for scene
|--scene_ids.txt : selected scene ids on each line
- Run the SceneReplica benchmarking in the Gazebo simulation environment using the Docker container.
- Run and Enter the Docker container:
bash docker/container_handler.sh run ros1-user
bash docker/container_handler.sh enter ros1-user
- Link the SceneReplica models to the Gazebo model path:
cd ~/.gazebo && rm models && ln -s ~/code/third-party/SceneReplica/Datasets/benchmarking/models
- Terminal 1: Start the ROS master (if not already running)
roscore
- Terminal 2: Start the Fetch Gazebo simulation with Just Robot
roslaunch ~/code/third-party/SceneReplica/launch/just_robot.launch
- Terminal 3: Start the MoveIt Planning Interface
roslaunch ~/code/third-party/SceneReplica/launch/moveit_sim.launch
- Terminal 4: Start Rviz
rviz -d ~/code/config/rviz/grasp_sim.rviz
- Terminal 5: Setup the desired scene in Gazebo Available scene ids: 10, 25, 27, 33, 36, 38, 39, 48, 56, 65, 68, 77, 83, 84, 104, 122, 130, 141, 148, 161
cd ~/code/third-party/SceneReplica/src && python setup_scene_sim.py --data_dir ../../../datasets
# Select the scene id you want to setup
# For example, to setup scene id 10
- Terminal 6: Run the Model-based Grasping
# Go to the SceneReplica source directory
cd ~/code/third-party/SceneReplica/src && python bench_model_based_grasping.py \
--pose_method gazebo \
--obj_order nearest_first \
--data_dir ../../../datasets \
--scene_idx 10
In this week, we will integrate the FoundationPose and NIDS-Net with the Fetch Gazebo simulation environment.
Repeat steps till Terminal 3 as described in the Fetch Gazebo Simulation section to set up the Fetch Gazebo simulation environment for Scene id 10.
In this practice, we will subscribe to the RGBD images and CameraInfo published by the Fetch Gazebo simulation and save them under /datasets/tmp
for later use.
- Practice 1: get the color, depth images and cam_K from simulation.
Complete the code in /notebooks/08_FoundationPoseROS.py
to get the RGBD images and CameraInfo from the subscribed ROS topics published by the Fetch Gazebo simulation. The answer could be found here.
Color Image | Depth Image |
---|---|
![]() |
![]() |
- Practice 2: Use SAM2 based segmentation toolkit to get the segment mask for Power Drill.
python tools/01_run_sam2_segmentation.py
The segmentation results will be saved under /datasets/tmp
.
Ensure you have installed the FoundationPose as described in the Environment Setup section.
Now, we have the inputs ready, we can run the FoundationPose to get the 6D poses of the objects in the scene.
- Notebook 05_FoundationPoseWrapper.ipynb: Understand how to run FoundationPose to estimate the 6D poses of the objects in the scene. The notebook will guide you through the process of preparing the inputs, running the FoundationPose, and visualizing the results.
First, follow steps in Run the Fetch Grasping in Gazebo Simulation to create the Gazebo simulation environment with the Fetch robot and the YCB Cracker object. No grasping is needed in this practice.
Next, finish the code in 09_fdpose_and_nidsnet_sim.py with below tasks:
- Subscribe the color image, depth image and camera info from the Fetch Gazebo simulation: refer to 08_FoundationPoseROS_answer.py to get the color, depth images and camera info.
- Run the NIDS-Net segmentation on the subscribed color image: refer to tools/test_nidsnet.py to run the NIDS-Net segmentation on the color image.
- Run the FoundationPose on the NIDS-Net segmentation results: refer to tools/test_nidsnet_and_fdpose.py to run the FoundationPose on the NIDS-Net segmentation results.
- The estimated poses by FoundationPose should be close to the ground truth poses of the objects in the scene.
- The answer could be found here.
The results will be saved under /datasets/tmp/fdpose_nidsnet_sim
.
Object Poses by FoundationPose | Object Poses by Gazebo (GroundTruth) |
---|---|
![]() |
![]() |
source ~/code/docker/source_env.sh
In our Lab ROS environment, the Fetch robot runs as the ROS master, and the desktop computer runs as a ROS client. To ensure the desktop computer can communicate with the Fetch robot, we need to source the environment variables every time we open a new terminal. This sets the ROS_MASTER_URI
to the Fetch robot's IP address and the ROS_IP
to the desktop computer's IP address.
The illustration of the ROS master and client relationship:
- Move the Fetch robot to a safe position, such as front of the table.
- Set the gripper to a proper position.
- Lift the torso to a proper height.
- Adjust the camera to look at the tabletop.
- Terminal 1: Start the ROS master (if not already running)
source ~/code/docker/source_env.sh && roscore
- Terminal 2: Start the MoveIt Planning Interface
source ~/code/docker/source_env.sh && roslaunch ~/code/config/launch/moveit_real.launch
- Terminal 3: Start Rviz
source ~/code/docker/source_env.sh && rviz -d ~/code/config/rviz/grasp_real.rviz
- Terminal 4: Run the Fetch Grasping in Real
source ~/code/docker/source_env.sh && python ~/code/tools/fetch_grasp_real.py
The fetch_grasp_real.py
script to execute the grasping. The script will execute below tasks:
- Lift the Fetch robot's torso and adjust the camera to look at the tabletop.
- The NIDS-Net will detect and segment the observed objects in the color image.
- The FoundationPose will estimate the 6D poses of the segmented objects.
- Setup the MoveIt Scene and do motion planning for each object.
- Once a
FULL grasp
is found, the Fetch robot will execute the planned grasping motion.