Skip to content

clarin-eric/fcs-sru-aggregator

Repository files navigation

FCS SRU Aggregator

CLARIN Federated Content Search Aggregator – Augmenting your Search Engine

Introduction

The CLARIN Federated Content Search (CLARIN-FCS) introduces an interface specification that decouples the search engine functionality from its exploitation, i.e. user-interfaces, third-party applications, to allow services to access heterogeneous search engines in a uniform way.

How does it work?

  1. a Client submits a query to an Endpoint
  2. The Endpoint translates the query from CQL or FCS-QL to the query dialect used by the Search Engine and submits the translated query to the Search Engine
  3. The Search Engine processes the query and generates a result set, i.e. it compiles a set of hits that match the search criteria.
  4. The Endpoint then translates the results from the Search Engine-specific result set format to the CLARIN-FCS result format and sends them to the Client.

The Aggregator

The FCS Aggregator is a search engine for language resources hosted at a variety of institutions. The user interface allows search queries to access heterogeneous data sources in a uniform way. It is called Aggregator because it sends out search queries to remote search engines and then collects the search results.

The FCS Aggregator is running at CLARIN.

The Specification

The Specification for Federated Content Search Core 2.0 can be found as a PDF document. Sources for the various specification documents can be found at the FCS Misc Webpage.

For more details visit at the CLARIN FCS – Technical Details page.

Changelog

For a detailed list of changes, please take a look at CHANGELOG.md.

Building

Requirements

The backend Java web server requires Java 11 or newer, with Maven version 3.6.3.

The frontend React web application requires Node 22 (with NPM 10). It is only required for re-compiling the frontend. You can also use the bundled, pre-compiled files if no further customization is required.

Docker might be required if you want to run everything containerized. Then both the Java and Node requirements are also not necessary as the multi-stage build process handles everything.

Source Code

# fetch the source code
git clone https://github.com/clarin-eric/fcs-sru-aggregator.git && cd fcs-sru-aggregator/

# initialize the git submodules
git submodule update --init --recursive aggregator-webui/

Dropwizard Server Application (Backend)

You can build the Java dropwizard server application by running the scripts/build.sh script. This will generate a aggregator-app-*.jar binary that bundles backend, frontend and all relevant dependencies.

./scripts/build.sh

React Web Frontend

The React web frontend is already included as assets in aggregator-app/src/main/resources/assets/webapp with the index page aggregator-app/src/main/resources/eu/clarin/sru/fcs/aggregator/app/index.mustache.

If you want to modify or update the web application, you can use the scripts/update-webui.sh script. This will use the aggregator-webui/ git submodule containing the sources and rebuild everything and then place the assets back into the Java asset folder.

# optionally update the git submodule code
#git submodule update --remote aggregator-webui/

# rebuild and update web UI files
./scripts/update-webui.sh

The React web frontend can also be run as a standalone web application. However, it requires the FCS Aggregator REST API to get data and process search requests. You can find the source code at https://github.com/clarin-eric/fcs-sru-aggregator-ui/.

Using Docker

To create the docker image, simply build the Dockerfile. This also allows to configure the frontend a bit by modifying the webui.env (template) file. The aggregator also uses the default configurations in aggregator-app/aggregator.yml which can be overriden by mounting another file over it.

docker build --tag=fcs-sru-aggregator .

If you do not want to change anything, you can also use the Dockerfile.simple which uses the prebuild web frontend assets and only recompiles the Java sources. The aggregator-app/aggregator.yml configurations will also be used.

Running

Configuration

See DEPLOYMENT.md for example deployment configurations and descriptions about settings.

Natively

You can run the application in development and production mode. Please look at the scripts/run-(devel|prod).sh files for more details.

In development the aggregator_devel.yml and for production the aggregator.yml configuration files are used.

# Run in development mode
# Also allows to attach a debugger
./scripts/run-devel.sh

# Run in production mode (uses the build aggregator-app-*.jar file)
./scripts/run-prod.sh

You then can access the locally running Aggregator at http://localhost:4019/.

Using Docker

Use docker mounts to allow access to generated data and logs, and to override default configuration files.

# copy of resources for faster startups
touch fcsAggregatorResources.json fcsAggregatorResources.backup.json
# statistics output directory
mkdir stats

# run in background (detached), and restart (unless stopped by operator)
#   -d --restart unless-stopped
# configure public port (internal 4019, external 4020)
#   -p 4020:4019
# aggregator configuration
#   -v $(pwd)/aggregator-app/aggregator.yml:/work/aggregator.yml:ro
# (cached) aggregator resources
#   -v $(pwd)/fcsAggregatorResources.json:/var/lib/aggregator/fcsAggregatorResources.json
#   -v $(pwd)/fcsAggregatorResources.backup.json:/var/lib/aggregator/fcsAggregatorResources.backup.json
# if statistics logging (fcsstats logger) is enabled, mount the output directory
#   -v $(pwd)/stats:/var/log/aggregator/stats
# provide a container name with `--name`
# and specify the docker image (see build step above)
docker run \
    -d \
    --restart unless-stopped \
    -p 4020:4019 \
    -v $(pwd)/aggregator-app/aggregator.yml:/work/aggregator.yml:ro \
    -v $(pwd)/fcsAggregatorResources.json:/var/lib/aggregator/fcsAggregatorResources.json \
    -v $(pwd)/fcsAggregatorResources.backup.json:/var/lib/aggregator/fcsAggregatorResources.backup.json \
    -v $(pwd)/stats:/var/log/aggregator/stats \
    --name fcs-sru-aggregator \
    fcs-sru-aggregator

# if you did not specify the `--rm` option and used the `--restart` parameter,
# you have to manually stop and remove the container to quit the server and allow for future starts
docker stop fcs-sru-aggregator
docker rm fcs-sru-aggregator

You then can access the locally running Aggregator at http://localhost:4020/.

FCS Endpoints

Endpoint Reference Implementations

If you have any kind of RESTful API to your Search Engine using the Korp Endpoint Reference Implementation as a starting point should be the way to go. If you more specifically are using Korp it should only be a simple adaptation to corpora and tagsets needed. In any case do not forget to look at the tests.

For lexical resources, an FCS Endpoint with Solr search engine might be a good starting point. It allows index XML structured data in the Solr search engine and handles search query conversion. It can also be used as a template if you want to connect other search engines by modifying the query translation and API handling.

An older CQI Bridge endpoint also exists but only support simple full-texts search.

For more endpoint implementations, please take a look at the Awesome FCS list.

Endpoint Validation

To test your Endpoint you can point the FCS Endpoint Validator (code) to your Endpoint.

About

CLARIN Federated Content Search v3 Aggregator – Augmenting your Search Engine

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •