ML Challenge 2025 Problem Statement

Smart Product Pricing Challenge

In e-commerce, determining the optimal price point for products is crucial for marketplace success and customer satisfaction. Your challenge is to develop an ML solution that analyzes product details and predict the price of the product. The relationship between product attributes and pricing is complex - with factors like brand, specifications, product quantity directly influence pricing. Your task is to build a model that can analyze these product details holistically and suggest an optimal price.

Data Description:

The dataset consists of the following columns:

sample_id: A unique identifier for the input sample
catalog_content: Text field containing title, product description and an Item Pack Quantity(IPQ) concatenated.
image_link: Public URL where the product image is available for download. Example link - https://m.media-amazon.com/images/I/71XfHPR36-L.jpg To download images use download_images function from src/utils.py. See sample code in src/test.ipynb.
price: Price of the product (Target variable - only available in training data)

Dataset Details:

Training Dataset: 75k products with complete product details and prices
Test Set: 75k products for final evaluation

Output Format:

The output file should be a CSV with 2 columns:

sample_id: The unique identifier of the data sample. Note the ID should match the test record sample_id.
price: A float value representing the predicted price of the product.

Note: Make sure to output a prediction for all sample IDs. If you have less/more number of output samples in the output file as compared to test.csv, your output won't be evaluated.

File Descriptions:

Source files

src/utils.py: Contains helper functions for downloading images from the image_link. You may need to retry a few times to download all images due to possible throttling issues.
sample_code.py: Sample dummy code that can generate an output file in the given format. Usage of this file is optional.

Dataset files

dataset/train.csv: Training file with labels (price).
dataset/test.csv: Test file without output labels (price). Generate predictions using your model/solution on this file's data and format the output file to match sample_test_out.csv
dataset/sample_test.csv: Sample test input file.
dataset/sample_test_out.csv: Sample outputs for sample_test.csv. The output for test.csv must be formatted in the exact same way. Note: The predictions in the file might not be correct

Constraints:

You will be provided with a sample output file. Format your output to match the sample output file exactly.
Predicted prices must be positive float values.
Final model should be a MIT/Apache 2.0 License model and up to 8 Billion parameters.

Evaluation Criteria:

Submissions are evaluated using Symmetric Mean Absolute Percentage Error (SMAPE): A statistical measure that expresses the relative difference between predicted and actual values as a percentage, while treating positive and negative errors equally.

Formula:

SMAPE = (1/n) * Σ |predicted_price - actual_price| / ((|actual_price| + |predicted_price|)/2)

Example: If actual price = $100 and predicted price = $120
SMAPE = |100-120| / ((|100| + |120|)/2) * 100% = 18.18%

Note: SMAPE is bounded between 0% and 200%. Lower values indicate better performance.

Leaderboard Information:

Public Leaderboard: During the challenge, rankings will be based on 25K samples from the test set to provide real-time feedback on your model's performance.
Final Rankings: The final decision will be based on performance on the complete 75K test set along with provided documentation of the proposed approach by the teams.

Submission Requirements:

Upload a test_out.csv file in the Portal with the exact same formatting as sample_test_out.csv
All participating teams must also provide a 1-page document describing:
- Methodology used
- Model architecture/algorithms selected
- Feature engineering techniques applied
- Any other relevant information about the approach Note: A sample template for this documentation is provided in Documentation_template.md

Academic Integrity and Fair Play:

⚠️ STRICTLY PROHIBITED: External Price Lookup

Participants are STRICTLY NOT ALLOWED to obtain prices from the internet, external databases, or any sources outside the provided dataset. This includes but is not limited to:

Web scraping product prices from e-commerce websites
Using APIs to fetch current market prices
Manual price lookup from online sources
Using any external pricing databases or services

Enforcement:

All submitted approaches, methodologies, and code pipelines will be thoroughly reviewed and verified
Any evidence of external price lookup or data augmentation from internet sources will result in immediate disqualification

Fair Play: This challenge is designed to test your machine learning and data science skills using only the provided training data. External price lookup defeats the purpose of the challenge.

Tips for Success:

Consider both textual features (catalog_content) and visual features (product images)
Explore feature engineering techniques for text and image data
Consider ensemble methods combining different model types
Pay attention to outliers and data preprocessing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.DS_Store		.DS_Store
Documentation_template.md		Documentation_template.md
EDA.ipynb		EDA.ipynb
Feature-image.ipynb		Feature-image.ipynb
Feature.ipynb		Feature.ipynb
Model.ipynb		Model.ipynb
README.md		README.md
basic_features_correlation.csv		basic_features_correlation.csv
basic_image_features.csv		basic_image_features.csv
basic_image_features_analysis.png		basic_image_features_analysis.png
column_names.txt		column_names.txt
complete_image_properties.csv		complete_image_properties.csv
download_progress.json		download_progress.json
failed_downloads.json		failed_downloads.json
feature_summary.csv		feature_summary.csv
image_mapping.csv		image_mapping.csv
model_comparison_plots.png		model_comparison_plots.png
model_comparison_results.csv		model_comparison_results.csv
sample_code.py		sample_code.py
test_data_complete.csv		test_data_complete.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Challenge 2025 Problem Statement

Smart Product Pricing Challenge

Data Description:

Dataset Details:

Output Format:

File Descriptions:

Constraints:

Evaluation Criteria:

Leaderboard Information:

Submission Requirements:

Academic Integrity and Fair Play:

Tips for Success:

About

Uh oh!

Releases

Packages

Languages

AdiSinghCodes/Amazon-ML-Challenge-2025

Folders and files

Latest commit

History

Repository files navigation

ML Challenge 2025 Problem Statement

Smart Product Pricing Challenge

Data Description:

Dataset Details:

Output Format:

File Descriptions:

Constraints:

Evaluation Criteria:

Leaderboard Information:

Submission Requirements:

Academic Integrity and Fair Play:

Tips for Success:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages