This repository provides a simplified implementation of a variation of CLIP by OPEN AI with Gensim Doc2Vec/Hugging Face DistilBERT and Facebook DINOv2/Google AI EfficientNet as the text and image encoders, respectively.
The repository also includes notebooks for training the models to fullfill the following tasks.
Objective 1: Read the sarcasm! Is this meme sarcastic or not? -- A classifier
Objective 2: Build a ranking system for MEMES
Here is a flow chart drawn by an awesome artist. (ME.)
We used the MEMOTION DATASET 7K as our training and testing dataset.
Dataset Class: Datasets/MemeDataset.py
Features: Images and Captions.
Dataset characterization
-
Dataset size: 7000. (6931 clean data.) Train-test split = 8:2
-
Text format: cvs file
-
Image format: jpg
-
Testing set size: 2000
-
Text Preprocessing: strip all special characters, watermarks, dates, and stop words. Lemmatization.
-
Image Preprocessing: file corruption, re-size
Three classes are included in MemeDatasets.py:
*The implementation of clip model is in the "custom_models" folder. *
The trainer module "CLIP_Classifier2.py" and training note of the model is in the root folder.
The Datasets folder includes a sample of images and texts and the Dataset class in CLIP_Datasets.py
accuracy: 0.7436 auroc 0.7969 f1 0.6392