BIM Element Semantic Classification using Sentence-Transformers

This project aims to develop a prototype system for automatic classification of BIM elements based on their textual descriptions in IFC files. The system assigns BIM elements to classes from multiple classification dictionaries supported by bSDD (including CCI, Uniclass, and others) by leveraging semantic embeddings.

Instead of relying on supervised fine-tuning (which requires labeled datasets), the approach uses sentence-transformers to generate semantic embeddings of element descriptions and classification classes, comparing them via cosine similarity. This enables flexible, multi-dictionary classification without binding to a fixed label set.

The project also explores enhancing embeddings quality using contrastive learning and few-shot learning techniques to improve classification accuracy. Initially, we wanted to use TSDAE but it requires for the model to have cross-attention layers and we were set on using the multilingual sentence transformer (due to the fact the IFC descriptions that come from our BIM models are often in Polish).

Key features

automatic extraction and parsing of textual descriptions from IFC files
embedding generation of BIM element descriptions and class labels using multilingual sentence-transformers
semantic similarity matching across multiple classification dictionaries (CCI, Uniclass, etc.)
optional fine-tuning via contrastive learning and few-shot learning to boost accuracy
evaluation using top-k accuracy, cosine similarity scores, F1-score and other metrics
export of classification results compatible with BIMVision via JSON/CSV format for seamless integration
modular pipeline designed for extensibility and further dictionary additions

Technologies & Tools

Language: Python 3.10+
NLP: sentence-transformers (paraphrase-multilingual-MiniLM), PyTorch
Data processing: pandas, numpy, matplotlib
APIs: bSDD REST API or local dictionary files
Evaluation metrics: cosine similarity, top-k accuracy, F1-score
Formats: JSON/CSV(?) for data exchange with BIMVision (C# interface)
Development: Jupyter Notebook, Git, VSCode

Usage

If you want to see how the algorithm works using a simple GUI, you need to:

Clone the repository

git clone https://github.com/DevStranger/BIM-Classification.git

Go to the right location

cd BIM-Classification

Install the required libraries

pip install -r requirements.txt

Go into the right folder

cd code

Run the app :)

streamlit run app.py

*the suggested sample data file is /data/ifc_objects.csv

If you would like to see how the model was trained and enhanced, you need to:

Clone the repository

git clone https://github.com/DevStranger/BIM-Classification.git

Start Jupyter Lab (or Jupyter Notebook) in your cmd (command line lol)

jupyter lab

Contents of the notebooks (in order of creation or recent updates):

test.ipynb - designed to precompute embeddings for CCI data to save on computational time and complexity in future tasks. It ensures that embeddings are either loaded if already computed or generated and saved if not.
parsed_test.ipynb - performs semantic classification of IFC elements by matching them to bSDD classes using sentence embeddings and cosine similarity. It demonstrates an automated way to assign standardized building system classes to IFC elements based on textual descriptions (nota bene: this is the initial test done on imaginary, sample data not real-life files).
full_bsdd_tests.ipynb - performs semantic classification of IFC elements against bSDD classes, similar to parsed_test.ipynb, but it incorporates more detailed IFC information, including numeric properties and P-Set attributes, to enhance classification accuracy and full bSDD information pulled from the official bSDD site. It demonstrates a more comprehensive semantic mapping pipeline.
tsdae_test.ipynb - fine-tunes a sentence embedding model using TSDAE (Text-to-Text Denoising AutoEncoder) on IFC data, then uses the fine-tuned model to classify IFC elements against CCI classes. The aim is to improve semantic embeddings for domain-specific text and provide more accurate similarity-based classification. It was later on proved, that this method will be inefficient in our case due to the use of a multilingual sentence-transformer which does not have the required cross-attention layers.
tsdae_model_train.ipynb - prepares a domain-specific TSDAE model for IFC and bSDD textual data. It consolidates multiple IFC CSV files, applies soft filtering on bSDD classes, and fine-tunes a sentence embedding model using a Denoising AutoEncoder.
mapped_ifc_test.ipynb - performs keyword-assisted semantic classification of IFC elements against bSDD classes using a fine-tuned TSDAE embedding model. It combines type-specific keyword filtering with embeddings to improve classification relevance.
ifc_bsdd_mapping.ipynb - demonstrates a semantic mapping and clustering pipeline for IFC elements against bSDD classes using sentence embeddings. It combines similarity-based matching with unsupervised clustering and visualization, providing both individual matches and global structure insights.
extra_layer_test.ipynb - demonstrates a hybrid classification pipeline that combines sentence embeddings from a fine-tuned TSDAE/few-shot model with a logistic regression classifier. It shows how to add an extra classification layer on top of embeddings to predict labels for IFC elements.
contrastive_test.ipynb - prepares a contrastive learning dataset from IFC and bSDD textual data and fine-tunes a sentence embedding model using the Multiple Negatives Ranking Loss (contrastive learning). The goal is to create embeddings that better capture semantic similarity between IFC elements and bSDD classes.
initial_tsdae_fewshot_test.ipynb - demonstrates few-shot fine-tuning of a contrastive sentence embedding model on a small domain-specific dataset and the addition of a logistic regression classification layer on top of embeddings. It creates a hybrid embedding + classifier model for IFC/bSDD text classification. Throughout the project, this notebook was used for various tasks and experiments along with the tester.ipynb notebook.
tester.ipynb- used throughout the project as a multipurpose tool, integrating several key workflows: it handles final evaluation of IFC object classification, performs post-processing and reranking of predictions, supports clustering of bSDD classes, and combines hybrid approaches including cosine similarity, few-shot embeddings, and a trained classifier. It also includes augmentation steps for IFC descriptions and generates final labels in CSV and JSON formats, alongside visualizations and summary statistics for analysis.

Results and Evaluation

*conducted using the sample data file /data/ifc_objects.csv

Distribution of similarity values

Classification results

Classification results with cluster distribution

Classes distribution along a cluster

Simple GUI Demo

*using the sample data file /data/ifc_objects.csv

After running the app, you will see a simple GUI asking you to upload a file in either `.csv` or `.json` format

When you upload the file, the app will let you know how many IFC objects have been found inside of it

Now we need to press the button to start the classification process (it might take a while depending on the file size and its contents)

After the algorithm is done, you will see the classification results in a table containing the best 3 calculated matches along with their score and the result of the classification layer

You can choose to download the classification results in a `.csv` file

Done!

Disclaimer

This project was developed as part of an internship at Datacomp IT in Kraków, Poland. The work presented here reflects the scope and objectives of the internship and is intended for educational and prototypical purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
code		code
data		data
docs		docs
models		models
notebooks		notebooks
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BIM Element Semantic Classification using Sentence-Transformers

Key features

Technologies & Tools

Usage

Results and Evaluation

Distribution of similarity values

Classification results

Classification results with cluster distribution

Classes distribution along a cluster

Simple GUI Demo

After running the app, you will see a simple GUI asking you to upload a file in either `.csv` or `.json` format

When you upload the file, the app will let you know how many IFC objects have been found inside of it

Now we need to press the button to start the classification process (it might take a while depending on the file size and its contents)

After the algorithm is done, you will see the classification results in a table containing the best 3 calculated matches along with their score and the result of the classification layer

You can choose to download the classification results in a `.csv` file

Done!

Disclaimer

About

Uh oh!

Releases

Packages

Languages

License

DevStranger/BIM-Classification

Folders and files

Latest commit

History

Repository files navigation

BIM Element Semantic Classification using Sentence-Transformers

Key features

Technologies & Tools

Usage

Results and Evaluation

Distribution of similarity values

Classification results

Classification results with cluster distribution

Classes distribution along a cluster

Simple GUI Demo

After running the app, you will see a simple GUI asking you to upload a file in either .csv or .json format

When you upload the file, the app will let you know how many IFC objects have been found inside of it

Now we need to press the button to start the classification process (it might take a while depending on the file size and its contents)

After the algorithm is done, you will see the classification results in a table containing the best 3 calculated matches along with their score and the result of the classification layer

You can choose to download the classification results in a .csv file

Done!

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

After running the app, you will see a simple GUI asking you to upload a file in either `.csv` or `.json` format

You can choose to download the classification results in a `.csv` file

Packages