Installing Crayfish¶
Needs Maintenance
The manual installation documentation is in need of attention. We are aware that some components no longer work as documented here. If you are interested in helping us improve the documentation, please see Contributing.
In this section, we will install:¶
- FITS Web Service, a webservice for identifying file metadata
- Islandora/Crayfish, the suite of microservices that power the backend of Islandora 2.0
- Indvidual microservices underneath Crayfish
FITS Web Service¶
The FITS Web Service is used to extract file metadata from files. The Crayfish microservice CrayFits will use this service to push FITS metadata back to Drupal. It comes in two pieces, the actual FITS tool and the FITS Webservice which runs in Tomcat.
FITS itself wraps other file identification and metadata tools which may require installing additional libraries. On Ububtu 20.04, the version this guide is using, we will install a few:
To set up the FITS application, first find the latest FITS version on GitHub to replace the [FITS_VERSION_NUMBER]
and then run the following commands:
cd /opt
sudo wget https://github.com/harvard-lts/fits/releases/download/[FITS_VERSION_NUMBER]/fits-[FITS_VERSION_NUMBER].zip
sudo unzip /opt/fits-[FITS_VERSION_NUMBER].zip -d /opt/fits
Similarly with the FITS webservice, get the current service version number to replace [FITS_SERVICE_WAR_VERSION_NUMBER]
:
Download the FITS webservice:
sudo -u tomcat wget -O /opt/tomcat/webapps/fits.war https://github.com/harvard-lts/FITSservlet/releases/download/[FITS_SERVICE_WAR_VERSION_NUMBER]/fits-service-[FITS_SERVICE_WAR_VERSION_NUMBER].war
Configure the webservice but adding the following lines to the bottom of /opt/tomcat/conf/catalina.properties
:
Restart Tomcat:
Wait for a few minutes to let the service start up the first time and then visit http://localhost:8080/fits/
to ensure it is working. You can also follow the catalina logs to see how tomcat is progressing in setting up each service it is running: sudo tail -f /opt/tomcat/logs/catalina.out
. To stop following the logs, hit control-C.
Crayfish 2.0¶
Installing Prerequisites¶
Some packages need to be installed before we can proceed with installing Crayfish; these packages are used by the microservices within Crayfish. These include:
- Imagemagick, which will be used for image processing. We'll be using the LYRASIS build of imagemagick here, which supports JP2 files.
- Tesseract, which will be used for optical character recognition; note that by default Tesseract can only understand English; several other individual Tesseract language packs can be installed using
apt-get
, and a list of available packs can be procured withsudo apt-cache search tesseract-ocr
- FFMPEG, which will be used for video processing
- Poppler, which will be used for generating PDFs
sudo apt-get install software-properties-common
sudo add-apt-repository -y ppa:lyrasis/imagemagick-jp2
sudo apt-get update
sudo apt-get -y install imagemagick tesseract-ocr ffmpeg poppler-utils
Cloning and Installing Crayfish¶
We’re going to clone Crayfish to /opt
, and individually run composer install
against each of the microservice subdirectories.
cd /opt
sudo git clone https://github.com/Islandora/Crayfish.git crayfish
sudo chown -R www-data:www-data crayfish
sudo -u www-data composer install -d crayfish/Homarus
sudo -u www-data composer install -d crayfish/Houdini
sudo -u www-data composer install -d crayfish/Hypercube
sudo -u www-data composer install -d crayfish/Milliner
sudo -u www-data composer install -d crayfish/Recast
sudo -u www-data composer install -d crayfish/CrayFits
Preparing Logging¶
Not much needs to happen here; Crayfish opts for a simple logging approach, with one .log
file for each component. We’ll create a folder where each logfile can live.
Configuring Crayfish Components¶
Each Crayfish component requires one or more .yaml
file(s) to ensure everything is wired up correctly.
Update the defaults to meet your needs
The following configuration files represent somewhat sensible defaults; you should take consideration of the logging levels in use, as this can vary in desirability from installation to installation. Also note that in all cases, http
URLs are being used, as this guide does not deal with setting up https support. In a production installation, this should not be the case. These files also assume a connection to a PostgreSQL database; use a pdo_mysql
driver and the appropriate 3306
port if using MySQL.
Using JWT for Crayfish Authentication
For Crayfish microservices use the lexik_jwt_authentication
package. They are configured to use the JWT_PUBLIC_KEY
environment variable to find the public key we created earlier (/opt/keys/syn_public.key
). Later on in this guide we will add the environment variable to the Apache configs, but you may alternatively write the path to the key in the lexik_jwt_authentication.yaml
file that resides along-side the security.yaml
files we edit in this section.
Homarus (Audio/Video derivatives)¶
Enable JSON Web Token (JWT) based access to the service by updating the security settings. Edit /opt/crayfish/Homarus/config/packages/security.yaml
to set firewalls: main: anonymous to false
and uncomment the provider
and jwt
lines further down in that section.
Edit /opt/crayfish/Homarus/config/packages/monolog.yaml
to point to the new logging directory:
Edit the commons config to update it with Fedora's location (if necessary) and enable the apix middleware in /opt/crayfish/Homarus/config/packages/crayfish_commons.yaml
:
crayfish_commons:
fedora_base_uri: 'http://localhost:8080/fcrepo/rest'
apix_middleware_enabled: true
Houdini (Image derivatives)¶
Currently the Houdini microservice uses a different system (Symfony) than the other microservices, this requires different configuration.
/opt/crayfish/Houdini/config/services.yaml | www-data:www-data/644
# This file is the entry point to configure your own services.
# Files in the packages/ subdirectory configure your dependencies.
# Put parameters here that don't need to change on each machine where the app is deployed
# https://symfony.com/doc/current/best_practices/configuration.html#application-related-configuration
parameters:
app.executable: /usr/bin/convert
app.formats.valid:
- image/jpeg
- image/png
- image/tiff
- image/jp2
app.formats.default: image/jpeg
services:
# default configuration for services in *this* file
_defaults:
autowire: true # Automatically injects dependencies in your services.
autoconfigure: true # Automatically registers your services as commands, event subscribers, etc.
# makes classes in src/ available to be used as services
# this creates a service per class whose id is the fully-qualified class name
App\Islandora\Houdini\:
resource: '../src/*'
exclude: '../src/{DependencyInjection,Entity,Migrations,Tests,Kernel.php}'
# controllers are imported separately to make sure services can be injected
# as action arguments even if you don't extend any base controller class
App\Islandora\Houdini\Controller\HoudiniController:
public: false
bind:
$formats: '%app.formats.valid%'
$default_format: '%app.formats.default%'
$executable: '%app.executable%'
tags: ['controller.service_arguments']
# add more service definitions when explicit configuration is needed
# please note that last definitions always *replace* previous ones
/opt/crayfish/Houdini/config/packages/crayfish_commons.yaml | www-data:www-data/644
crayfish_commons:
fedora_base_uri: 'http://localhost:8080/fcrepo/rest'
syn_config: /opt/fcrepo/config/syn-settings.xml
syn_enabled: True
/opt/crayfish/Houdini/config/packages/monolog.yaml | www-data:www-data/644
monolog:
handlers:
houdini:
type: rotating_file
path: /var/log/islandora/Houdini.log
level: DEBUG
max_files: 1
The below files are two versions of the same file to enable or disable JWT token authentication.
/opt/crayfish/Houdini/config/packages/security.yaml | www-data:www-data/644
Enabled JWT token authentication:
# To disable Syn checking, set syn_enabled=false in crayfish_commons.yaml and remove this configuration file.
security:
# https://symfony.com/doc/current/security.html#where-do-users-come-from-user-providers
providers:
users_in_memory: { memory: null }
jwt:
lexik_jwt: ~
firewalls:
dev:
pattern: ^/(_(profiler|wdt)|css|images|js)/
security: false
main:
# To enable Syn, change anonymous to false and uncomment the lines further below
anonymous: false
# Need stateless or it reloads the User based on a token.
stateless: true
# To enable JWT authentication, uncomment the below 2 lines and change anonymous to false above.
provider: jwt
jwt: ~
# activate different ways to authenticate
# https://symfony.com/doc/5.4/security.html#firewalls-authentication
# https://symfony.com/doc/5.4/security/impersonating_user.html
# switch_user: true
# Easy way to control access for large sections of your site
# Note: Only the *first* access control that matches will be used
access_control:
# - { path: ^/admin, roles: ROLE_ADMIN }
# - { path: ^/profile, roles: ROLE_USER }
Disabled JWT token authentication:
security:
# https://symfony.com/doc/current/security.html#where-do-users-come-from-user-providers
providers:
jwt_user_provider:
id: Islandora\Crayfish\Commons\Syn\JwtUserProvider
firewalls:
dev:
pattern: ^/(_(profiler|wdt)|css|images|js)/
security: false
main:
anonymous: true
# Need stateless or it reloads the User based on a token.
stateless: true
Hypercube (OCR)¶
Enable JSON Web Token (JWT) based access to the service by updating the security settings. Edit /opt/crayfish/Hypercube/config/packages/security.yaml
to set firewalls: main: anonymous to false
and uncomment the provider
and jwt
lines further down in that section.
Edit /opt/crayfish/Hypercube/config/packages/monolog.yaml
to point to the new logging directory:
Edit the commons config to update it with Fedora's location (if necessary) and enable the apix middleware in /opt/crayfish/Hypercube/config/packages/crayfish_commons.yaml
:
crayfish_commons:
fedora_base_uri: 'http://localhost:8080/fcrepo/rest'
apix_middleware_enabled: true
Milliner (Fedora indexing)¶
Enable JSON Web Token (JWT) based access to the service by updating the security settings. Edit /opt/crayfish/Milliner/config/packages/security.yaml
to set firewalls: main: anonymous to false
and uncomment the provider
and jwt
lines further down in that section.
Edit /opt/crayfish/Milliner/config/packages/monolog.yaml
to point to the new logging directory:
Edit the commons config to update it with Fedora's location (if necessary) and enable the apix middleware in /opt/crayfish/Milliner/config/packages/crayfish_commons.yaml
:
Creating Apache Configurations for Crayfish Components¶
Finally, we need appropriate Apache configurations for Crayfish; these will allow other services to connect to Crayfish components via their HTTP endpoints.
Each endpoint we need to be able to connect to will get its own .conf
file, which we will then enable.
Possible Route Collisions
These configurations would potentially have collisions with Drupal routes, if any are created in Drupal with the same name. If this is a concern, it would likely be better to reserve a subdomain or another port specifically for Crayfish. For the purposes of this installation guide, these endpoints will suffice.
/etc/apache2/conf-available/Homarus.conf | root:root/644
Alias "/homarus" "/opt/crayfish/Homarus/public"
<Directory "/opt/crayfish/Homarus/public">
FallbackResource /homarus/index.php
Require all granted
DirectoryIndex index.php
SetEnv JWT_PUBLIC_KEY /opt/keys/syn_public.key
SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>
/etc/apache2/conf-available/Houdini.conf | root:root/644
Alias "/houdini" "/opt/crayfish/Houdini/public"
<Directory "/opt/crayfish/Houdini/public">
FallbackResource /houdini/index.php
Require all granted
DirectoryIndex index.php
SetEnv JWT_PUBLIC_KEY /opt/keys/syn_public.key
SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>
/etc/apache2/conf-available/Hypercube.conf | root:root/644
Alias "/hypercube" "/opt/crayfish/Hypercube/public"
<Directory "/opt/crayfish/Hypercube/public">
FallbackResource /hypercube/index.php
Require all granted
DirectoryIndex index.php
SetEnv JWT_PUBLIC_KEY /opt/keys/syn_public.key
SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>
/etc/apache2/conf-available/Milliner.conf | root:root/644
Alias "/milliner" "/opt/crayfish/Milliner/public"
<Directory "/opt/crayfish/Milliner/public">
FallbackResource /milliner/index.php
Require all granted
DirectoryIndex index.php
SetEnv JWT_PUBLIC_KEY /opt/keys/syn_public.key
SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>
/etc/apache2/conf-available/CrayFits.conf | root:root/644
Alias "/crayfits" "/opt/crayfish/CrayFits/public"
<Directory "/opt/crayfish/CrayFits/public">
FallbackResource /crayfits/index.php
Require all granted
DirectoryIndex index.php
SetEnv JWT_PUBLIC_KEY /opt/keys/syn_public.key
SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1
</Directory>
Enabling Each Crayfish Component Apache Configuration¶
Enabling each of these configurations involves creating a symlink to them in the conf-enabled
directory; the standardized method of doing this in Apache is with a2enconf
.
Restarting the Apache Service¶
Finally, to get these new endpoints up and running, we need to restart the Apache service.