This repository is related to Quantified Trees – a project funded by the Federal Ministry for the Environment, Nature Conservation and Nuclear Safety of Germany
The main goal (in short) is to compute daily sensor nowcasts and forecast for all street trees in Berlin. For more information on the project or if you want to reach out to us for general inquiries about the project, see the projects' landing page.
The structure of this repository is as follows:
qtrees-ai-data/
├── qtrees/ Python packages for data retrieval and computing sensor now- and forecasts
├── scripts/ Python and shell scripts for regular tasks and initials setup
├── data/ Placeholder for downloaded and processed data
├── test/ Unit and integration tests
└── infrastucture/
├── database/ SQL database setup
├── scheduling/ Crontabs for scheduling of regular tasks
└── terraform/ Terraform files for provision of required infrastructure
└── ansible/ Ansible files for providing data and starting services on ec2
The following chapters are organized as follows:,
- Use of Terraform to provide infrastructure on AWS
- Use Ansible to setup services on AWS EC2
- Run a local setup (without AWS)
- Everyday use
- Open issues and additional resources
The aim of the setup is to create a:
- a postGIS database as an AWS RDS-service
- postgREST-Server as a RESTful wrapper around the DB.
In the following, we assume that we are using two AWS environments:
dev
-environment withset_variables.dev.sh
prod
-environment withset_variables.prod.sh
Note: in the current qtrees setup, these files are shared in 1password.
The set_variables-Skript sets all environment variables needed for using
terraform
for creating the infrastructureansible
for setting up the ec2-instance
Note: skip chapter (1), if infrastructure already exists.
Local requirements are:
For (1password-)shared config files
- Place the shared
set_variables.dev.sh
andset_variables.prod.sh
in your localinfrastructure/terraform
-directory. - Place a (1password)-shared
id_rsa_qtrees
into your local~/.ssh
-directory.
Or, if creating from scratch,
- Create a new public-private keypair (used to establish a connection between Ansible and EC2 instance) by
ssh-keygen -t rsa
. - Insert your credentials into
infrastucture/terraform/set_variables.sh
and adjust variables.
The main terraform
-file is
infrastructure/terraform/main.tf
It contains the component
backend "s3" {
bucket = "qtrees-terraform"
key = "tfstate"
region = "eu-central-1"
}
which makes terraform store the state in a s3-bucket. Adjust this part if using a different s3 bucket or remove this part, if you want to store the state locally.
In the following, we are using the terminal and are running all commands from the terraform
directory.
Therefore, go from the project root directory to the terraform directory :
cd infrastructre/terraform
Subnets
Currently, we are using only a single public_subnet
.
You can add the following line
private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
to get an additional private subnet and assign e.g. the RDS-instance to it.
Elastic IPs
Currently, we using a static IPs for dev- and prod-environment by using
# re-use existing elastic ip (if set)
data "aws_eip" "qtrees" {
public_ip = "${var.ELASTIC_IP_EC2}"
}
If you want to use dynamic IPs, use the following lines instead:
resource "aws_eip" "qtrees" {
instance = aws_instance.qtrees.id
tags = {
Name = "${var.project_name}-iac-${var.qtrees_version}"
}
}
Then,
- To receive the public IPv4 DNS run
terraform output -raw dns_name
- for ec2-adress
terraform output -raw eip_public_dns
If using it for the first time, create workspace for dev
and prod
first:
terraform workspace new dev
terraform init
Adjust for prod
accordantly.
Note: in the current qtrees setup: dev is called staging and prod is called production
You can list and switch workspaces by
terraform workspace list
terraform workspace select staging
To set up the dev-environment,
- run
terraform workspace select dev
to select the proper workspace - run
set_variables.dev.sh
to set up the accordant environment file (here: dev-environment).
Run
terraform plan
If okay, run
terraform apply [-auto-approve]
and follow the instructions to apply the main.tf
script.
To rollback provisioned services if not needed anymore, run
terraform destroy
Set environment variables
If you want to deploy on DEV, set accordant variables by running in a local terminal:
. infrastructure/set_variables_private.dev.sh
Or, for PROD, run
. infrastructure/set_variables_private.prod.sh
Note: Make sure that the correct branch is defined by GIT_BRANCH
within the script.
Host-file
After a fresh terraform run, the file
infrastructure/terraform/ansible/hosts
is automatically generated.
If not available, place the file manually with the following lines:
[qtrees_server:vars]
ansible_ssh_private_key_file=~/.ssh/id_rsa_qtrees
[qtrees_server]
<ip-adress>
The placeholder <ip-adress>
has to be set to the ip-adresse of the
DEV- or PROD-machine.
There are two playbooks available. Run
playbooks/setup-ubuntu.yml
after a fresh terraform runplaybooks/update-ubuntu.yml
if you want to update the database and the code base only.
In the following, we are using the setup-ubuntu.yml
. Adjust the commands accordantly, if using the update-playbook.
Run
ANSIBLE_STDOUT_CALLBACK=yaml ansible-playbook -vv -i ansible/hosts ansible/playbooks/setup-ubuntu.yml
in your local terminal from the infrastructure
-directory and make yourself a coffee.
Check
infrastructure/ansible/playbooks/setup-ubuntu.yml
for the entire deployment procedure. And adjust setup if needed.
Note:
- Use
-v
to-vvv
options to receive additional debug information. - Set
ANSIBLE_STDOUT_CALLBACK=yaml
to get human-readable output.
Run
ssh -i ~/.ssh/id_rsa_qtrees ubuntu@$TF_VAR_ELASTIC_IP_EC2
You can also set the IP directly instead of using TF_VAR_ELASTIC_IP_EC2
.
Before, you need to
- source the proper environment file
- properly place
id_rsa_qtrees
The main aim of the setup is to run:
- a postGIS DB as a docker service
- a postgREST docker as a RESTfull wrapper around the DB
Additionally, you might run a
- swagger-ui docker to inspect postgREST
- pgAdmin docker for database administration
Everything is a pre-defined, ready-to-use docker image :)
Note: Local setup is helpful for debugging but not used for deployment.
For local setup, you need the files:
set_environment.local.sh
docker-compose.local.yml
Note: Instead of using a RDS database, the local setup run a postgres-db as a local docker service.
Install requirements:
- install
psql
brew install libpq
- probably, you have to run
brew link --force libpq
as well
- install
docker
- follow instructions here
- install
docker-compose
brew install docker-compose
- install
Golang-Migrate
brew install golang-migrate
Check config files:
- Adjust
set_environment.local.sh
or place shared file (from 1password).
From the project root directory, execute the following steps in the same shell, since the scripts depend on environment variables.
cd infrastructure/database
source set_environment.local.sh
docker-compose -f docker-compose.local.yml up -d
Note:
- Running
docker container ls
, you should see now two running containers:postgrest
at port3000
postgis
at port5432
- The database is not yet configured, so you cannot interact with it yet
- Ignore the error messages of postgREST as it cannot connect to the db before configuration
Run:
source set_environment.local.sh
source init_db.sh
migrate -database postgresql://postgres:${POSTGRES_PASSWD}@${DB_QTREES}:5432/qtrees?sslmode=disable -path migrations up
docker-compose -f docker-compose.local.yml restart
Note:
docker-compose restart
is needed to make changes visible- you should now be able to access the database hosted in the
docker
container with:PGPASSWORD=${POSTGRES_PASSWD} psql -p 5432 -h localhost -U postgres
- type
\q
to exit thepsql
interactive shell
- in your browser, you should see some
swagger
output generated byPostgREST
when accessing the addresslocalhost:3000
Run:
source set_environment.local.sh
docker-compose -f docker-compose.tools.yml up -d
Note:
- Running
docker container ls
, you should see now four running containers:postgrest
at port3000
postgis
at port5432
swagger-ui
at port8080
pgadmin4
at port5050
- Access
swagger-ui
via browser at addresslocalhost:8080
- on top of the
swagger-ui
landing page, change address tolocalhost:3000
and click "Explore" - you should be able to issue REST requests to the database
- the database does not yet contain any data at the beginning
- on top of the
- Access
pgadmin
via browser at addresslocalhost:5050
- login with user
admin@admin.com
- password is
root
- login with user
- Make sure the database is running and configured as described above
- Activate conda environment with
conda activate qtrees
. See also chapter Mini-conda above. - Go to project root directory and run
export PYTHONPATH=$(PWD)
to make moduleqtrees
available. - Run Python scripts accordant to
infrastructure/terraform/ansible/playbooks/setup-ubuntu.yml
- step
fill database (static data)
- step
fill database (dynamic data)
- step
- Optional:
- Load private data into db (extra repo)
- Run training
- Run inference
- Restart docker to make changes available to other services via
docker-compose -f docker-compose.local.yml restart
Note:
- Setup might take up to a few hours!
- The ansible playbook
infrastructure/terraform/ansible/playbooks/setup-ubuntu.yml
is the current single point of truth w.r.t filling the database with content. - You might also consider the step
s3 sync
in the playbook as a first step to get shared data.
Info: https://github.com/golang-migrate/migrate
Golang migrate is used to take care and track of database changes. The current migrations steps can be found in folder infrastructure/database/migrations
. In the database, there is a table called schema_migrations
, which tells which database version we currently have.
This version corresponds to the first number of the migration .sql files. For example, 000002_230301_setup_users.up
is would be database version 2, created 1st of March.
To add a change to the database structure, go to the infrastructure/database
folder and run:
source set_environment.local.sh
migrate create -ext sql -dir migrations -seq XXXXXX_informative_name_of_migration
where XXXXXX is the current date on the form YYMMDD. This will create two files in the infrastructure/database/migrations
folder, one "up.sql" file, and one "down.sql" file. Implement the change in the "up" file and reverse it in the "down" file.
To implement the change, run: (If you work on AWS, you have to remove the ?sslmode=disable
to connect to the database)
migrate -database postgresql://postgres:${POSTGRES_PASSWD}@${DB_QTREES}:5432/qtrees?sslmode=disable -path migrations up 1
To reverse the step run:
migrate -database postgresql://postgres:${POSTGRES_PASSWD}@${DB_QTREES}:5432/qtrees?sslmode=disable -path migrations down 1
In case of Dirty Database
Dirty database means that something went wrong with either the 'up' or 'down' migration calls, and this needs to be fixed manually. When fixed, you can let migrate know from which db version to start anew from, with the command:
migrate -path migrate -database postgresql://postgres:${POSTGRES_PASSWD}@${DB_QTREES}:5432/qtrees?sslmode=disable force 4
(starting anew from version 4)
When your new up and down migration files work without bugs on the local machine, you can push them to Github as usual. The changes will be implemented with the playbooks/update-ubuntu.yml
call, which implemented all the migrations starting from the current db version.
For more information on how to use Golang Migrate see:
https://github.com/golang-migrate/migrate/blob/master/database/postgres/TUTORIAL.md
Instead of loading data into the db from scratch, you can also dump data into a file and restore from file.
- Run
. scripts/script_backup_db_data.sh
to dump data into a file. The DB structure is not part of the backup. - Run
. scripts/script_restore_db_data.sh
to restore data from that file.
By default, the data is stored into data/db/<qtrees_version>
. If you run it locally, qtrees_version
is set as local
.
If you want to store it somewhere else, just provide the destination folder, e.g.
. scripts/script_backup_db_data.sh data/foo
Note:
- The postfix
<qtrees_version>
is automatically appended to the data directory. - If you get the message
pg_dump: error: aborting because of server version mismatch
, deactivating the conda env might be a quick fix.
Just getting rid of the data but not the structure is quite simple. Run:
PGPASSWORD=${POSTGRES_PASSWD} psql --host $DB_QTREES -U postgres -d qtrees -c "SELECT * from private.truncate_tables()"
You want to start from scratch?
You can drop all qtrees databases and roles via:
PGPASSWORD=${POSTGRES_PASSWD} psql --host $DB_QTREES -U postgres -c "DROP DATABASE qtrees WITH (FORCE);" -c "DROP DATABASE lab_gis;" -c "DROP ROLE gis_admin, ui_user, ai_user, authenticator, web_anon;"
For local setup, there is also a simpler way:
- shut down docker container
- delete local directories
pgadmin
andpgdata
- start docker container again.
There are 2 tools for insights / administration:
- swagger-ui: visualizes
swagger.json
generated by postgREST-API. - pgadmin: allows to visualize and adapt db
You can run them for remote use via:
docker-compose -f docker-compose.tools.yml up -d
To connect to the database, run the following lines in Python:
from sqlalchemy import create_engine
from geopandas import GeoDataFrame
import geopandas
db_connection_url = "postgresql://postgres:<your_password>@<host_db>:5432/qtrees"
con = create_engine(db_connection_url)
sql = "SELECT * FROM api.soil"
soil_gdf = geopandas.GeoDataFrame.from_postgis(sql, con, geom_col="geometry")
You have of course to adapt the parameter <your_password>
and <host_db>
.
One can also write data into the database, like:
db_connection_url = "postgresql://postgres:<your_password>@<host_db>:5432/qtrees"
engine = create_engine(db_connection_url)
soil_gdf.to_postgis("soil", engine, if_exists="append", schema="api")
assuming that soil_gdf
is a geopandas dataframe.
(1) Login via
curl -X 'POST' \
'http://<host_db>:3000/rpc/login' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"email": "string",
"pass": "string"
}'
and remember output as token.
Currently, a token is not needed for reading data.
(2) Set token: export TOKEN=<your_token>
(3) Get data via
curl -X 'GET' \
'http://<host_db>:3000/weather_stations' \
-H 'accept: application/json' \
-H 'Range-Unit: items'
(4) Write data For writing data, a token is needed in general. You can provide a token like this:
curl -X 'POST' 'http://0.0.0.0:3000/trees' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"gml_id": "string",
"baumid": "string",
[..]
}' -H "Authorization: Bearer $TOKEN"
Note: the provided JWT_SECRET is used to encrypt and validate the token.
Note: the size of JWT_SECRET must greater equal 32.
You can test and validate the jwt token via: https://jwt.io/
JWT token consists of a header:
{
"alg": "HS256",
"typ": "JWT"
}
and a payload, e.g.
{
"role": "ai_user",
"email": "ai@qtrees.ai",
"exp": 1655049037
}
If you add new tables, also think of adding / updating permissions to user roles. For example:
grant usage on schema api to ai_user;
grant all on api.trees to ai_user;
grant all on api.radolan to ai_user;
grant all on api.forecast to ai_user;
grant all on api.nowcast to ai_user;
grant all on api.soil to ai_user;
grant all on api.weather to ai_user;
grant all on api.weather_stations to ai_user;
[...]
grant select on api.user_info to ai_user;
Note: tables not exposed to anonymous user web_anon
will not be visible in postgREST
To add a postgREST user, connect to db and run:
insert into basic_auth.users (email, pass, role) values ('ai@qtrees.ai', 'qweasd', 'ai_user');
Of course, adapt email
, pass
and role
as needed
Currently, we have 2 roles: ai_user
and ui_user
.
- Compute solar exposure maps with using
scripts/script_qgis_solar_exposure_maps.py
. This script needs to be run in the Python Console of QGIS. The necessary elevation map tiles could be downloaded from Berlin Open Data Portal. The script should be used in the code editor of QGIS. A guide on the subject can be found here - Smoothen the calculated solar exposure maps with
scripts/script_map_filtering.py
. This script applies a box filter to the given files. The default kernel size is 5. - Calculate shading indices with the script
scripts/script_shading_index.py
. This script takes as input the solar exposure maps and tree dataset and calculates 4 seasonal shading indices for each tree in the dataset.
In the documentation, the jwt_secret
is set via:
ALTER DATABASE qtrees SET "app.jwt_secret" TO 'veryveryveryverysafesafesafesafe';
That doesn't work on RDS.
RDS uses rds_superuser
instead of superuser
.
Therefore, the installation of postgis differs a bit:
GRANT rds_superuser TO gis_admin;
vs ALTER ROLE gis_admin SUPERUSER;
The postgis-setup is inspired from https://postgis.net/install/.
The setup of postgREST is based on:
The setup of JWT in postgres is taken from: https://github.com/michelp/pgjwt.
We use mini-conda for providing the needed Python packages. To install conda, follow these steps
- Download conda via:
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.12.0-Linux-x86_64.sh
- Run installation via:
sh Miniconda3-py39_4.12.0-Linux-x86_64.sh
- Remove download:
rm Miniconda3-py39_4.12.0-Linux-x86_64.sh
- To use conda, re-login or run
source .bashrc
Create environment via conda env create --file=requirements.yaml
If conda is slow, try this:
conda update -n base conda
conda install -n base conda-libmamba-solver
conda config --set experimental_solver libmamba
Alternatively, try to set conda config --set channel_priority false
.
Run conda update --all --yes
to update packages.
If conda is stucked, install the conda environment manually by creating empty env.
Therefore, create environment conda create -n qtrees python==3.10.6
.
Install packages from requirements.yaml
individually.