Skip to content

rhondene/Codon-Usage-in-Python

Repository files navigation

Python tools for Codon Usage Bias Analysis

-------Software Setup----------:

  1. Open Terminal (Mac/Linux) or Command Prompt (Windows)
  2. Clone the repository:
    git clone https://github.com/rhondene/Codon-Usage-in-Python.git
  3. Navigate to the project folder:
    cd Codon-Usage-in-Python/codon-usage-gui
  4. Install the package:
    pip install -e .
  • See the test_data folder for examples of the outputs of each tool on the same input fasta file ('NB_CDS.fasta')

------Run Codon Analysis Via Browser Web App (Recommended) ------

🔬 Features

📊 Comprehensive Analysis Types:

  • Transcriptome-wide RSCU (Relative Synonymous Codon Usage)
  • Per-gene RSCU analysis for individual sequence patterns
  • Amino acid usage analysis (expected vs observed frequencies)
  • Codon usage per 1000 codons for normalization
  • Relative codon frequencies per gene

Open your terminal and type:

codon-usage-gui

A web page will automatically open in your browser! If not, just click on the Local URL: http://localhost:xxx to surface the web page image

image

------How to Use Stand-alone Command-line Tool ------

Here, you will use a single line of command to run the executable binary file (.pyz) via the shell terminal .

Compute_RSCU_gene :

  • Computes relative synonymous codon usage of each 59 degenerate codons per each coding sequence (CDS) according to Sharp and Li, 1986 PMCID: PMC340524
  • Input: FASTA file of N coding sequences (CDS)
  • Output: comma-separated table (csv) of the relative synonymous codon usage for each transcript: i.e. a matrix of N transcripts x 59 RSCU values

How to Use :

  1. Copy the Compute_RSCU_gene.pyz binary from the Codon-Usage-in-Python/Compute_RSCU_gene folder into your project folder containing the input FASTA file.
  2. Open a terminal window (bash, gitbash, powershell, etc) in the same working folder.
  3. Type the following in the terminal, be sure to replace the names of the input and output arguments with your own :
	python Compute_RSCU_gene.pyz -CDS example_cds.fasta -out rscu_results
  • Also run python Compute_RSCU_gene.pyz --help for help menu.

Compute_RSCU_tw :

  • Computes relative synonymous codon usage (RSCU) and absolute counts of the 59 synonymous codons over the entire set (aggregate) of coding sequences('transcriptome-wide'). Implemented according to Sharp and Li, 1986 PMCID: PMC340524
  • Input: single or multifasta file of coding sequences (CDS)
  • Output: a comma-separated table (.csv) file of the 59 RSCU values

How to Use :

  1. Copy Compute_RSCU_tw.pyz binary from Codon-Usage-in-Python/Compute_RSCU_tw folder into your working folder that contains the input fasta file of CDS.

  2. Open a terminal window (bash, gitbash, powershell, etc) in the same working folder.

  3. To run the programn, type the command below in the terminal shell (be sure to replace arguments with the actual name the input and output files):

    	python Compute_RSCU_tw.pyz -CDS example.fasta -out results

CodonCount:

Computes the length normalized codon frequency of each 61 sense codons of a coding sequence (CDS), and returns CSV .

    Relative Frequency of Codon_i=  (frequency of codon_i)/(total number of codons in the CDSj)

How to Use :

  1. Copy the CodonCount.pyz file in Codon-Usage-in-Python/CodonCount folder into your working folder with the input fasta file(s).
  2. Open a terminal window (bash, gitbash, powershell, etc) in the same working folder.
  3. To run the programn, type the command below in the terminal shell (be sure to replace arguments with the actual name the input and output files):
    python CodonCount.pyz -CDS example.fasta -out example_output

Also run python CodonCount.pyz --help for help menu.

CodonUsage_per_1000:

Computes codon usage per 1000 of the whole transcriptome.

  1. Copy the CodonUsage_per_1000.pyz file in Codon-Usage-in-Python/CodonUsage_per_1000 folder into your working folder with the input fasta file(s).
  2. Open a terminal window (bash, gitbash, powershell, etc) in the same working folder.
  3. To run the programn, type the command below in the terminal shell (be sure to replace arguments with the actual name the input and output files):
    python CodonUsage_per_1000.pyz -CDS all_CDS.fasta -out  results_cu

Also run python CodonUsage_per_1000.pyz --help for help menu.

fasta2csv :

  • Converts fasta file to two-column csv table (Header | Sequence);

aa_usage :

fix_fasta.py:

  • Corrects the issue of newlines within the same sequence.

Glossary Codon Usage Metrics

Codon Usage Bias

The unequal usage of synonymous codons within a gene or genome i.e. the deviation of synonymous codons from a uniform distribution due to a combination of natural selection, neutral mutational bias and genetic drift.

Relative Synonymous Codon Usage

  • The RSCU of a codon is computed as its observed frequency divided by its expected frequency within a gene or whole transcriptome under the null hypothesis of equal synonymous codon usage.
  • RSCU greater that 1 means that the codon is used more than expected by random chance. [Sharp & Li 1987].
  • Codons with high RSCU in highly expressed genes are referred to as "optimal codons". For many species the optimal codons are selectively recognised by the abundant tRNAs, which is often taken as an indication selection pressures shaping codon usage patterns [Ikemura 1983; Wint et al 2022].
  • Amino Acid Frequency:

    • If a particular amino acid is in some way adaptive, then it should occur more frequently than expected by chance.
    • This can easily be tested by calculating the expected frequencies of amino acids and comparing to observed. The codons and observed frequencies of particular amino acids are given in the table.
    • The frequencies of DNA bases in nature are 22.0% uracil, 30.3% adenine, 21.7% cytosine, and 26.1% guanine. The expected frequency of a particular codon can then be calculated by multiplying the frequencies of each DNA base comprising the codon. The expected frequency of the amino acid can then be calculated by adding the frequencies of each codon that codes for that amino acid.
    • As an example, the RNA codons for tyrosine are UAU and UAC, so the random expectation for its frequency is (0.220)(0.303)(0.220) + (0.220)(0.303)(0.217) = 0.0292. Since 3 of the 64 codons are nonsense or stop codons, this frequency for each amino acid is multiplied by a correction factor of 1.057.

    About

    My Python3 package to compute common codon usage statistics given a FASTA of DNA sequences

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published