AbAnalysis (Python 3 only)

AbAnalysis: Antibody Sequence Analysis Pipeline

Description: AbAnalysis is an antibody sequence analysis pipeline designed for processing, filtering, and annotating next-generation sequencing (NGS) antibody repertoires. It integrates germline assignment, junction identification, UAID parsing, and consensus sequence generation into a reproducible and automated workflow.

Key Features:

🔹 Paired-end read merging with PANDAseq

🔹 Germline assignment and junction identification using IgBLAST

🔹 Automatic filtering of non-functional sequences and sequencing artifacts

🔹 Frameshift correction for indel errors

🔹 UAID (Unique Antibody Identifier) parsing with configurable length (e.g., -u 20)

🔹 Data storage and querying via MongoDB integration

🔹 Consensus sequence generation using MUSCLE and Biopython

🔹 Cross-platform support with precompiled IgBLAST binaries for Linux and macOS

Workflow Summary:

Merge paired-end reads using PANDAseq.

Process merged reads through the IgBLAST-based analysis pipeline.

Filter out non-functional or artifact sequences; correct indels.

Parse UAIDs (Unique Antibody Identifiers) and populate sequence metadata.

Store annotated sequences in MongoDB for downstream analysis.

Bin sequences by UAID, discard singletons, and add germline variable gene sequences as tie-breakers.

Generate consensus sequences with MUSCLE and Biopython.

Re-analyze consensus sequences and store final results in a separate MongoDB database.

Usage

To run AbAnalysis on a single FASTA or FASTQ file:
python ab_analysis.py -i <input-file> -o <output-directory> -t <temp-directory>

To iteratively run AbAnalysis on all files in an input directory:
python ab_analysis.py -i <input-directory> -o <output-directory> -t <temp-directory>

Additional options

-m, --merge Input directory should contain paired FASTQ (or gzipped FASTQ) files. Paired files will be merged with PANDAseq prior to processing with AbAnalysis.

-u N, --uaid N Sequences contain a unique antibody ID (uaid) of length N. The uaid will be parsed and added to the JSON output.

-s, --species Select the species from which the input sequences are derived. Supported options are 'human', 'mouse', and 'macaque'. Default is 'human'.

-n, --next_seq Use if the sequences were generated on a NextSeq sequencer. Multiple lane files from the same sample will be merged.

Helper scripts

Two helper scripts are included:
batch_merge.py performs PANDAseq merging on a directory of paired FASTQ (or gzipped FASTQ) files.
mongoimport.py iteratively imports a directory of JSON files into a MongoDB database.

Requirements

Python 3 >= 3.7
biopython >= 1.76

batch_merge.py requires PANDAseq (https://github.com/neufeld/pandaseq)
mongoimport.py requires MongoDB >= 2.6 (http://www.mongodb.org/) and pymongo >= 3.7

Notes

You don't need to install igblastn. The binaries are included in this repository.
AbAnalysis should work correctly with Windows(x86, x64), Linux, OS X

You can install almost all the requirements with pip or anaconda

pandaseq will require some level of professional skills to compile binaries for Windows. OS X/Linux compiled versions you can find under the official releases tab on GitHub (https://github.com/neufeld/pandaseq/releases)

For the Python 2 version usd python2 branch or the original repository

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
database		database
internal_data		internal_data
optional_file		optional_file
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
ab_analysis.py		ab_analysis.py
batch_merge.py		batch_merge.py
blast_parse.py		blast_parse.py
igblastn_darwin		igblastn_darwin
igblastn_linux		igblastn_linux
igblastn_win32.exe		igblastn_win32.exe
igblastn_win64.exe		igblastn_win64.exe
mongoimport.py		mongoimport.py
pandaseq.py		pandaseq.py
pre_processing.py		pre_processing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AbAnalysis (Python 3 only)

Usage

Additional options

Helper scripts

Requirements

Notes

About

Uh oh!

Languages

License

rmukh/abanalysis

Folders and files

Latest commit

History

Repository files navigation

AbAnalysis (Python 3 only)

Usage

Additional options

Helper scripts

Requirements

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages