TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking

This is the official repository of TAM-VT. We now support training and testing on VOST dataset. This work is still under submission. Please stay tune there!

🏢 Environment setup

Ubuntu 20.04, python version 3.9, cuda version 11.3, pytorch 1.12.1

conda create -n vost python=3.9
conda activate vost
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Dataset

Please visit VOST official website for downloading the dataset.

You can also download the training and validation set videos and annotations under this link.

🚀 Training

Please refer to the sbatch scripts in scripts/vost_train.sh for more details.

Please first modify the config to match the dataset path.
Please download the pre-trained weight on static images and put the checkpoint in the checkpoints/.

Pretrained weight on Static datasets
Run the training script.
```
bash scripts/vost_train.sh
```

🚀 Evaluation on VOST

Please refer to the sbatch scripts in scripts/vost_eval.sh for more details.

Please download the model weight on VOST and put the checkpoint in the checkpoints/.

Model weight on VOST
Run the training script.
```
bash scripts/vost_eval.sh
```

📐 VOST Evaluation

To evaluate the predicted results using VOST evaluation scripts, we first need to obtain the predicted results in the VOST format.

first add these augs into the eval script:

eval=True

eval_flags.plot_pred=True

eval_flags.vost.vis_only_no_cache=True

eval_flags.vost.vis_only_pred_mask=True

This allows you to predict the results iteratively and plot all the results. The results will be saved in the defined "output" directory with the following file structure.

tamvt_eval
└───plot_pred
│   └───555_tear_aluminium_foil
│   │   └───object_id_1
│   │       └───frames
│   │       │   └───frame00012.png
│   │       │   │   ...
│   │       │   └───frame00600.png
│   │       └───reference_crop.jpg
│   └───556_cut_tomato
│   │   ...
│   └───10625_knead_dough
│

To get the evaluation score, please follow the protocol in the VOST repo.

python3 evaluation/evaluation_method.py --set val --dataset_path [PATH_TO_VOST_DATASET] --results_path [PATH_TO_PRED_DIR]

Example:

python3 evaluation/evaluation_method.py --set val --dataset_path ../datasets/VOST/VOST/ --results_path [PATH_TO_plot_pred_DIR] ./checkpoints/tamvt_eval/plot_pred

Some tips for evaluation

You might need to pip install pandas to install the package.

Please use a cluster with at least 32 GB of memory and an 8+ core CPU, or it will reach the memory limit and be terminated.

Citation

If you find TAM-VT useful for your research and applications, please cite using this BibTeX:

@article{goyal2023m3t,
  title={TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking},
  author={Goyal, Raghav and Fan, Wan-Cyuan and Siam, Mennatullah and Sigal, Leonid},
  journal={arXiv preprint arXiv:2312.08514},
  year={2023}
}

@misc{goyal2023tamvt,
    title={TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking},
    author={Raghav Goyal and Wan-Cyuan Fan and Mennatullah Siam and Leonid Sigal},
    year={2023},
    eprint={2312.08514},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
challenge		challenge
config		config
datasets		datasets
models		models
scripts		scripts
sources		sources
util		util
.amltignore		.amltignore
.gitignore		.gitignore
README.md		README.md
engine_static.py		engine_static.py
engine_unified_eval_vost_memory.py		engine_unified_eval_vost_memory.py
engine_vost_memory.py		engine_vost_memory.py
main_static.py		main_static.py
main_vost_memory.py		main_vost_memory.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking

🏢 Environment setup

Dataset

🚀 Training

🚀 Evaluation on VOST

📐 VOST Evaluation

Some tips for evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

davidhalladay/TAM-VT

Folders and files

Latest commit

History

Repository files navigation

TAM-VT: Transformation-Aware Multi-scale Video Transformer for Segmentation and Tracking

🏢 Environment setup

Dataset

🚀 Training

🚀 Evaluation on VOST

📐 VOST Evaluation

Some tips for evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages