The Tweet Sentiment Analysis project aims to analyze the sentiment of tweets. Using natural language processing (NLP) techniques, this project classifies tweets into positive, negative, neutral and irrelevant categories. This can be useful for businesses, researchers, and individuals who want to gauge public sentiment on various topics.
- Fetches tweets based on keywords or hashtags.
- Preprocesses tweet text to clean and normalize the data.
- Uses machine learning models to classify the sentiment of each tweet.
- Visualizes the results with graphs and charts.
To run this project, you will need Python 3.x and the following libraries:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- nltk
- tweepy
- jupyter
You can install these dependencies using pip:
pip install pandas numpy matplotlib seaborn scikit-learn nltk tweepy jupyter
-
Clone this repository to your local machine.
git clone https://github.com/yourusername/tweet-sentiment-analysis.git
-
Navigate to the project directory.
cd tweet-sentiment-analysis
-
Obtain Twitter API credentials by creating a developer account on Twitter and creating an app. Save your API keys and tokens.
-
Create a configuration file (
config.py
) and add your Twitter API credentials:# config.py CONSUMER_KEY = 'your_consumer_key' CONSUMER_SECRET = 'your_consumer_secret' ACCESS_TOKEN = 'your_access_token' ACCESS_TOKEN_SECRET = 'your_access_token_secret'
-
Open the Jupyter notebook:
jupyter notebook TweetSentimentAnalysis.ipynb
-
Follow the instructions in the notebook to:
- Authenticate with the Twitter API.
- Fetch tweets using keywords or hashtags.
- Preprocess the tweet text.
- Train and evaluate a sentiment analysis model.
- Classify the sentiment of new tweets.
- Visualize the results.
The preprocessing steps include:
- Removing URLs, mentions, and hashtags.
- Converting text to lowercase.
- Removing punctuation and special characters.
- Tokenization and stopword removal.
- Lemmatization.
The notebook provides an example of training a machine learning model using the following steps:
- Split the data into training and testing sets.
- Vectorize the text data using TF-IDF.
- Train a classifier (e.g., Logistic Regression, SVM).
- Evaluate the model using accuracy, precision, recall, and F1-score.
The notebook includes code for visualizing the results, such as:
- Distribution of sentiments.
- Word clouds for positive, negative, and neutral tweets.
- Confusion matrix for model evaluation.
Contributions are welcome! If you have suggestions or improvements, please create an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
- The Natural Language Toolkit (nltk) library.
- The scikit-learn library for machine learning.
- The Twitter API for providing tweet data.
Feel free to modify this README file to better suit your project's specifics and details.