Skip to content

A simple Python project to test PySpark algrothms for Data Vault transformations on Spark.

alpine-data/pyspark-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark Playground

A simple Python project to test PySpark algrothms for Data Vault transformations on Spark.

Development

System Requirements

To start development of the project. Create a local development environment.

$ cd pyspark-datavault
$ conda create -p ./env python=3.8
$ conda activate ./env
$ poetry install

To run the tests of the project, execute the following command from project root.

$ python -m pytest tests

Test Data

Relational Schema

image

Data Vault Schema

image

About

A simple Python project to test PySpark algrothms for Data Vault transformations on Spark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •