Skip to content

Conversation

@laerdon
Copy link
Contributor

@laerdon laerdon commented Feb 1, 2025

ConvoKit currently does not allow clients to use their own models, especially critical if such models are fine-tuned for the datasets which they use with ConvoKit. Currently, the classifiers powering features like politeness analysis and hypergraph representation are based upon sk-learn models, which are generally outdated and less robust than those provided by the HuggingFace Transformers library. We aim to update ConvoKit to support a more modular design which will provide users with a broader selection of models. Users want to use their own models, and leverage the ease of use that ConvoKit provides with navigating conversational corpuses. As of now, the Classifier class contains all functionality, including methods like fit() and transform(). We aim to delegate that functionality to a ClassifierModel abstract class, which will be the type of the internal classification model classifier_model.

Tested on local machine—fit and transform run successfully. More testing may be needed on a GPU-enabled environment.
An example is provided in convokit/examples/classifier/modular-classifier-example.ipynb.
This change deprecates pred_feats, the attribute of Classifier. Now, users are expected to produce their own torch Dataset containing this information. This also deprecates the evaluate_with_cv and evaluate_with_train_test_split methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant