page://unlinkedIn

Lead Engineer @ Platinum Informatics

May 2023 — Apr 2024
Full Time / Employee

note: the company has re-branded as Platinum Discovery

Design, development, implementation and maintenance of software product suite:
* architecture reworks to cut infra costs (~40% cost reduction)
* rewriting product front-end UI / UX
* removing redundant complexity within the products
* planning migrations to third party systems
* migration of core product services from Digital Ocean onto AWS

Internal evaluation of the product stack with executives:
* highlighting ongoing issues and explaining how they had come about
* presented multiple options for improvements
* provided recommendations on the best strategic action for the business

Supporting the data science and lab science teams:
* set up build and release processes for data science team
* simplifying lab science data workflows for end-users

Mentoring developer and data scientists:
* workshops on storing/processing data at scale: data warehousing/data lakes/data lakehouses; parquet/delta tables/change data feed; parallelised functional programming in python
* code review across engineering and data science teams
* teaching data scientists tooling usage: containers; new relic monitoring; conda packaging

System administration for internal systems:
* AWS: root account set up and management; AWS Organisations; AWS IAM Identity Center SSO; billing (incl. reports and budgets)
* GitLab (self-hosted): reorganisation of groups/projects; secure membership permissions with suitable policies for roles; CI/CD runner management.
* JumpCloud: SSO management (new users/groups & external services like AWS/GitLab)


Data/DevOps/Infra Eng. @ Platinum Informatics

Aug 2021 — Apr 2023
Part-Time / Contractor

note: the company has re-branded as Platinum Discovery

Advising and providing recommendations:
* general Dev/Ops good practices
* data strategy
* data application architecture
* performance optimisations
* standardisation across the tech stack

Developing and maintaining CI/CD pipelines with new and existing applications / projects:
* reducing manual build/deploy workloads
* reducing failures and breakage in both staging and production environments
* improving trust in CI/CD process
* identifying bad releases and performing production rollbacks

Supporting data science team:
* creating and maintaining new backend services for data science pipelines
* creating and maintaining additional data science pipelines
* building and/or sourcing custom bioinformatics application container images
* teaching how to use images within data science pipelines

Added continuous monitoring to the stack:
* added customised New Relic monitoring agents
* standardising logging approaches across the tech stack
* infrastructure, service and site downtime/issues alert notifications

Cloud cost optimisations:
* redesigning product architecture to switch off unnecessary infrastructure
* working around third party solutions to minimise worker costs
* generally advising on best practices when deploying to cloud services (scale it in / turn it off)

General emergency fire extinguisher. Some things we had to deal with:
* CI/CD runners going offline periodically
* broken Digital Ocean servers
* bad config files in production
* broken GlusterFS volumes and DNS troubles
* automated GitLab version upgrades and backups after a cryptomining script took over the server (this happened in my first month or so, way before I became GitLab admin)


PhD Research Student @ University of Dundee

Oct 2018 — May 2022
Full Time / Student

Studying the security of automatic speech recognition models trained with Connectionist Temporal Classification (CTC) and their sensitivity to exploratory evasion attacks (aka audio adversarial examples) under a perfect knowledge threat model. Mostly worked in Python with Tensorflow v1 to evade the Mozilla DeepSpeech Speech-to-Text model.

The code repository is here: github://dijksterhuis/cleverSpeech

The most promising results came from combining the CTC algorithm’s natural bias towards the “blank” token (used for repeated characters in language) and additional pre-processing to find low energy audio frames pre-MFCC feature extraction.

This novel attack was able to evade the Mozilla DeepSpeech model with significant improvements compared to other state of the art attacks at the time:
* reduced perturbation sizes under an l2-norm
* higher signal-to-noise ratio
* fewer iterations required to evade

I improved on the original Carlini & Wagner attack codebase in a number of ways:
* fully containerised implementation for portable execution on various servers (on-prem / AWS / GCP)
* fixed various bugs and ‘approximated’ code paths in the original work
* generation of adversarial examples in batches of up to 100 examples
* constructed a framework/standard model for generating computational graphs for attack methods

Some other things I also explored:
* Exploring the effects of heuristic path selection for efficient attacks, i.e. an attack that makes the least modifications to input data
* An “efficient” attack for CTC modified beam search decoders which did not require maximum likelihood estimation
* Additional size constraints and evaluation methodologies e.g. simulation an attack on silence to gauge the effectiveness of an attack method without any input signals

Sysadmin of the CVIP working group’s Deep Learning GPU servers, which involved:
* implementing, developing and maintaining JupyterHub server instances across servers using a forked version of dockerspawner, with the aim to allow other students to develop models/experiments and execute their code without having to know anything about SSH or linux
* developing and maintaining GitHub Actions build pipelines for standardised GPU enabled deep learning container images (available here: https://hub.docker.com/u/uodcvip and code here: https://github.com/UoD-CVIP/ServerImageBuilder)
* maintaining NVIDIA driver versions on heterogeneous servers used with the NVIDA container runtime
* Scoping out and discussing V2 implementations utilising tools such as Kubeflow or Determined.AI


Data/DevOps Eng. @ Merkle | Aquila Insight

Feb 2018 — Oct 2018
Full Time / Employee
note — the Aqualia Insight brand has since been “retired”

The Aqualia Insight Engineering team was generally split into two work streams: Product, for the ‘Discovery’ analytics platform, and Project for marketing analytics project support.

Product:
* POC project for V2 release of Discovery platform: migrating existing applications to new architecture; simplifying deployment pipelines; improving functionality & usability
* POC functionality included: multiple cloud provider deployments (AWS, GCP, Azure), auto-scaling Spark clusters within Kubernetes (& AWS Fargate); existing Discovery V1 services migrated into Kubernetes, linking into existing AWS infrastructure.
* Back-end product development integrating new applications & features into the existing Discovery platform on AWS, including: the Hive Metastore across EC2 & EMR Spark clusters; Apache Zeppelin & Apache Airflow.
* Daily bug fixes across entire tech stack, rapidly deploying onto all affected platforms.

Project:
* Developing & maintaining client ETLs (external marketing API streams, files ~100GB) with scalable and performant code (some Scala, mostly PySpark), managing data access according to GDPR & internal rules.
* Deploying new client platforms with customised additional tooling for specific projects (e.g. additional database services w/ AWS RDS, custom services).

Misc:
* Training & supporting data scientists/analysts on Big Data tech, tools and best practices through workshops on: Big Data; Distributed Computing & ETL best practices.


Data Engineering MSC @ University of Dundee

Oct 2016 — Sep 2017
Full Time / Student
Grade: Distinction

Modules:
* Big Data Technologies — Spark; Hadoop; NoSQL Databases (Redis/Mongo etc.); Erlang
* Business Intelligence Systems — Kimball vs. Inmon Data Warehousing; Sun vs. Star Schema; Microsoft SQL Server SSIS
* Introduction to Data Mining & Machine Learning: Bayesian probability modelling; MCMC; SVMs; neural networks; model training/validation/evaluation
* Programming Languages for Data Engineering: Javascript; Python; MATLAB; R
* DevOps & Microservices: CI/CD; DevOps methodology; docker; Azure; Jenkins
* Research Methods: Statistics; research studies etc

Awarded Class Medal for Best Overall Computing Student 16/17
Class Representative for Data Engineering MSc Students

Dissertation: Teaching Machines to Break Symmetric Key Encryption with Neural Networks


Various Roles @ PRS For Music

Sep 2012 — Oct 2016
Full Time / Employee

Rather than listing detailed descriptions for each role, each of the roles are summarised below with their dates of employment.

Policy Analyst | Apr 2015 – Oct 2016
* Developed, analysed and realised changes in multi-domain royalty distribution policies
* Performed high value/priority analyses for management teams/committees
* Analysed and modelled large and complex datasets, clearly communicating results to stakeholders
* Developed and maintained new analysis models across departments
* Built a broad yet detailed knowledge of policies and processes, becoming a subject matter expert
* A business expert in existing BI tool implementations, providing business support for the BI team
* Coordinated and managed minor projects

Service Delivery Co-Ordinator | May 2014 – Apr 2015
* Managed change requests for stakeholders and service providers
* Maintained and improved process and reference documentation
* Recommended process improvements to support company-wide strategic objectives
* Tracked KPIs against SLAs, appropriately reporting and escalating all non-compliance
* Developed tools to improve day-to-day management
* Deputised for Service Delivery Managers with stakeholders and service providers
* Continuous improvement of systems and business knowledge

Distribution Intelligence Analyst | Sep 2013 – May 2014
* Created quarterly KPI reports for the Operations Management Team, leading to development and maintenance of replacements models
* Regular reporting of quarterly royalty distributions for internal / external stakeholders
* Continuous improvement of analysis methods, systems and business knowledge
* Ad-hoc business analysis for internal / external stakeholders
* Tracked and reported on KPIs for internal operations

Distribution Intelligence Officer | Sep 2012 – Sep 2013
* Regular reporting of quarterly royalty distributions for internal and external stakeholders
* Developed new reporting models, simplifying and expanding processes
* Ad-hoc business analysis for internal / external stakeholders