Categories
notes tech

rust-pinger service status slack notifications

https://doc.rust-lang.org/std/thread/fn.sleep.html
https://doc.rust-lang.org/std/time/struct.Instant.html#method.now
https://rust-lang-nursery.github.io/rust-cookbook/file/read-write.html
https://doc.rust-lang.org/std/fs/struct.File.html#method.open
https://doc.rust-lang.org/std/io/trait.Read.html
https://docs.rs/chrono/latest/chrono/
https://stackoverflow.com/questions/27312069/how-can-i-iterate-over-a-vector-of-functions-and-call-each-of-them
https://doc.rust-lang.org/book/ch12-02-reading-a-file.html
https://stackoverflow.com/questions/26643688/how-do-i-split-a-string-in-rust
https://stackoverflow.com/questions/37888042/remove-single-trailing-newline-from-string-without-cloning
https://stackoverflow.com/questions/14154753/how-do-i-make-an-http-request-from-rust
https://docs.rs/reqwest/0.11.9/reqwest/
https://docs.rs/reqwest/0.11.9/reqwest/blocking/index.html
https://docs.rs/reqwest/0.11.9/reqwest/blocking/struct.Request.html
https://api.slack.com/apps/A03404BKQ1G/incoming-webhooks?success=1
https://doc.rust-lang.org/std/macro.format.html
https://docs.rs/toml/0.5.8/toml/
https://toml.io/en/
https://doc.rust-lang.org/std/net/struct.TcpStream.html

Categories
notes tech

[aws] docs

Need these for some part-time work

  • Launch Templates https://docs.aws.amazon.com/autoscaling/ec2/userguide/LaunchTemplates.html
  • Spot Fleet Scaling https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-step-scaling.html
  • EC2 Auto Scaling Limits https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-capacity-limits.html
  • EC2 Spot Instance Requests https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html?icmpid=docs_ec2_console
  • Launch template –> AMI User data MIME scripts https://docs.aws.amazon.com/batch/latest/userguide/launch-templates.html#example-mount-an-existing-amazon-efs-file-system
  • Static (Elastic) IPs https://aws.amazon.com/premiumsupport/knowledge-center/ec2-associate-static-public-ip/
  • AWS EC2 autoscaling (external blog guide) https://www.cloudsavvyit.com/2043/getting-started-with-aws-autoscaling/
  • [StackOverflow] literally the point of Spot Instances https://stackoverflow.com/a/11996798/5945794
  • AWS SLA https://aws.amazon.com/compute/sla/
  • AWS EC2 SSH Connect https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-methods.html

Pricings

  • EC2 On Demand https://aws.amazon.com/ec2/pricing/on-demand/
  • EC2 Spot Instance pricing history https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances-history.html
  • EC2 Spot Pricing current history https://aws.amazon.com/ec2/spot/pricing/
  • Reserved EC2 instance https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/
  • Cloudwatch https://aws.amazon.com/cloudwatch/pricing/
  • EC2 Autoscaling (no charge)
  • EBS Storage https://aws.amazon.com/ebs/pricing/

 

Categories
notes tech

[scala] slack client api

https://github.com/slack-scala-client/slack-scala-client/issues

Categories
notes random tech

[hn[ front end redesigns are stupid

https://news.ycombinator.com/item?id=30382520

Categories
notes random

[hn] the columbo method

https://news.ycombinator.com/item?id=30362856

Categories
notes tech

[hn] block stackoverflow clone for DDG

https://github.com/quenhus/uBlock-Origin-dev-filter

Categories
music notes random

[hn] music theory for nerds

https://news.ycombinator.com/item?id=30358903

Categories
notes tech

remarkable

Folks keep telling me I probably have some form of ADHD and the wealth of random pieces of paper that I scribbled on 14 months ago yet still occupy space in my living room may attest to that.

So this might be a worthy £400 expenditure to solve the “paper” problem without buying a full blown tablet.

https://remarkable.com

Categories
games notes sampling

prairie fire

Been heavily into the Arma 3 CDLC recently and an excerpt of this harrowing recording is played at the end of the co-op missions

https://www.professionalsoldiers.com/forums/showthread.php?t=15653

 

This is a recording of two Recon Teams (RT’s) who are in dire straits. Both RT’s are loosing a battle wherby death is immenient.

 

RT Colorado is the team that is running for its life. RT Hawaii is holding their own. Both RT’s have called out a “Prairie Fire” in Laos near the Ho Chi Minh Trail and are approximately 10 miles apart as the crow flies. Colorado has just been hit by a North Vietnamese platoon of 40 men who desire no more than to wipe this team completely off the face of the Earth.

 

What exactly does a “Prairie Fire mean? It means at least three things, they are: 1) You are in contact with a much superior force than yours. 2) Either completely surrounded or will be. 3) Death is imminent.

 

All pilots that flew gunships, helicopters, attack and fighter aircraft were given a “briefing” before flying in country. That briefing entailed what to do if a FAC has called out a “Prairie Fire” over the radio. By the rules in Vietnam everyone listening was to stop what they were doing and come to the aid of the FAC/Recon Team(s).

 

When you here Platster call on the radio: “I have your smoke, where do want the firepower brought in?” you will hear Pat Mitchel’s voice stating that “[There are] only two of us left and Charlie is dead on our ass!”. Mixter was killed a few minutes before this and the Indigenous troops are nowhere to be seen. Also, it is during this time that Mitchel is carrying Lyn St Laurent as he is seriously wounded himself. They are fighting for for their lives.

 

 

 

 

Categories
notes tech

auth0 jupyterhub authenticator

i’m sysadmin of some linux gpu servers in the phd working group and having to manually setup new users is a massive PITA.

a few options for replacements

  • Auth0 Authentication
  • KeyCloak OIDC / OAuth2 Authentication
    • https://www.keycloak.org/
    • functional+tested implementation up and running
    • only problem is existing user data — the user’s UID inside the spawned container depends on the order in which users start logging in, rather than ownership of local files.
    • could bind mount /home/{user} into container at /mnt/server then change $NB_UID for container user during usermod to match the UID of the user’s /home/{user}/.bashrc file — means another change to jupyterhub’s start.sh script in the Docker images.
    • Could also symlink /mnt/server to /home/{user}/server at end of start.sh with everything else in /home/{user} mounted in a docker volume? although docker volumes are NOT stored on the SSD for some servers because of disk space issue so user data wouldn’t be on the SSD :[
    • Better idea would probably be mount /home/{user} as before and change $NB_UID of $NB_USER to the uid of .bashrc (without chown-ing user files)
    • If no /home/{user} exists… then…. what happens? JupyterHub server does create the user account… so should be a simple case of look at uid of .bashrc … see it’s the same as $NB_UID… do nothing…?
    • BUT … JupyterHub will only create the /home/{user} directory inside its container (if using the completely containerised version) … which means that /home would still need to be mounted in the jupyterhub-{type} containers…. This will probably cause an absolute mess for user data though there will be multiple users that have files with UID=1000 etc. And because students can still SSH onto the servers it means they might be able to access other user’s data… :[
  • Native Authenticator
Categories
notes tech

sig-mlops

https://lists.cd.foundation/g/sig-mlops

This is a public list for the CDF MLOps SIG. All meetings and discussions are held in the open, and everyone is welcome to join. The current membership, calendar, and meeting documents can be found at https://github.com/cdfoundation/sig-mlops.

the Sig-MLOps 2021 roadmap speaks volumes to me and my experience as a machine learning phd student…

At this point in the development of the practice, it perhaps helps to understand that much of ML and AI research and development activity has been driven by Data Science rather than Computer Science teams. This specialisation has enabled great leaps in the ML field but at the same time means that a significant proportion of ML practitioners have never been exposed to the lessons of the past seventy years of managing software assets in commercial environments.

As we shall see, this can result in large conceptual gaps between what is involved in creating a viable proof of concept of a trained ML model on a Data Scientist’s laptop vs what it subsequently takes to be able to safely transition that asset into a commercial product in production environments. It is not therefore unfair to describe the current state of MLOps in 2020 as still on the early path towards maturity and to consider that much of the early challenge for adoption will be one of education and communication rather than purely technical refinements to tooling.

most other phd/professor type folks were not the least bit interested in dealing with these sorts of problems during development — basic codebase documentation often seems to be considered a waste of time based on the hundreds of github repos I’ve looked over.

it’s likely these are a subset of SEP fields and i’ve settled on calling the phenomena “Someone Else Will Clean Up the Mess” fields (or SEWCUM fields for short).

Categories
notes tech

“hacker laws”

https://github.com/dwmkerr/hacker-laws

the robustness principle is pretty apt for adversarial example research….

https://github.com/dwmkerr/hacker-laws#the-robustness-principle-postels-law

Categories
notes

linux find

Write this up as a response to someone on hackernews but didn’t submit it. Dumping here for reference…


search for exact pattern in all basenames

find . -name "<pattern>"

search for pattern at the start of basenames

find . -name "<pattern>*"

search for pattern at the end of basenames

find . -name "*<pattern>"

search for pattern anywhere in the basename

find . -name "*<pattern>*"

search for multiple patterns anywhere in the basename

find . -name "*<p1>*<p2>*"

path based patterns (same wildcards as above)

find . -path "*<p1>/*/<p2>*"

directories using basenames

find . -type d -name -f "*<pattern>*"

non directories using basenames (technically directories are files too!)

find . -type f -name -f "*<pattern>*"

only recurse 3 directories deep

find . -maxdepth 3 -name "*<pattern>*"

start 2 directories deep

find . -maxdepth 2 -name "*<pattern>*"

search only in directories 2 levels deep

find . -maxdepth 2 -mindepth 2 -name "*<pattern>*"

execute a command on each result (force remove files)

find . -type f -name "*<pattern>*" -exec rm -f {} \;

execute two commands in order on each result

find . -type f -name "*<pattern>*" -exec cp -f {} ./archive/ \; -exec rm -f {} \;

use find results with GNU parallel to speed up smaller tasks

(so long as they’re safe in parallel that is)

find . -name "*<pattern>*" | parallel ./some-script.sh {}

Categories
notes

github/coqui-ai: mitigating tts misuse discussion

original posted here:

https://github.com/coqui-ai/TTS/discussions/1036


I’ve spent most of my PhD figuring out ways to attack Mozilla DeepSpeech but my PhD supervisor and I spent some time discussing this topic for generative image models so I’m gonna chip in here…

I’ll heavily caveat:

  • I’m not up to speed on TTS in the slightest (abusing properties of CTC has basically been most of my PhD)
  • detecting / preventing deepfakes / generative model misuse is an active research problem (and will be for a long time to come)

I think that any technical solutions will be easily worked around, or someone will just reproduce the code, at some point it will be as trivial as running a jupyter notebook. The cat is out of the bag so to speak. What is needed are societal and legal approaches.

Like [above post] I don’t think there’s a technological fix here, and I agree that we need societal and legal measures.

I disagree that this is solely a legal / societal issue.

Unfortunately this is the cat and mouse game of the adaptive security cycle (see Biggio and Roli). Someone designs mitigations, someone breaks them, someone fixes them, someone breaks them… repeat ad infinitum.

No system is ever going to be 100% secure — or 100% unable to generate malicious data in the generative model case.

In security we aim to make it as hard as possible to perform feasible attacks instead of aiming for completely impossible. So the job is to make the “as trivial as running a jupyter notebook” case as unlikely as possible.

The real trick is to make a small change that has a big impact and I think coqui could do something to help in that regard by initially focussing on the tts-server application… which would be my first port of call for doing nefarious things with TTS.

Script kiddies vs. ML developers vs. APTs

A bit of a break down of potential adversaries is always helpful when discussing things like this.

  • Script Kiddies: These folks don’t particularly have much know how and are looking for the “run a jupyterhub script” approach.
  • Developer: Know how to clone the repo and modify source code, but do not have resources to alter or retrain the model.
  • ML Developer: Has the know how and resources to modify and completely retrain from scratch.

Script kiddies

Mitigations being enabled by default within tts-server would mean script kiddies are no longer given the option to do nefarious things (those covered by mitigations at least) as they’re no longer a pip install TTS && tts-server ... away from doing bad things.

As a simple example, if users want to turn off the mitigations? Then tts-server adds an audible watermark to try and stop you from doing nefarious things. Want to remove the watermark? Turn the mitigations back on.

Script Kiddie mitigations are a pretty good starter for 10. A positive side effect is that anyone who deploys an unmitigated instance of tts-server in the wild (or applications derived from the the coqui code) won’t be able to let any random user on the web generate speech without an audible watermark, and mitigated instances would (ideally) block the generation of malicious voice data requested by users.

Developer

The developer level adversaries are harder to mitigate against but the tts-server modifications would require them to clone/modify the source code. The effort required might scale proportionally to the number of people willing to go through and actually find out how to modify the backend code for tts-server and/or TTS. Chances are one of this set will be able to set up an unmitigated tts-server in the wild by disabling the audible watermarking example above in the source code.

Depending on how far you want to go this you could start to mitigate against some of more determined attackers. Data and/or Model level mitigations could make it prohibitive in terms of cost and expertise to the more determined Developer level adversaries.

These developer level ones can be dealt with later on (if deemed worth the effort).

ML Developer

No much you can do here as the project is open source except make their life very difficult (but then you’ll be making everyone’s life difficult in the process.


An existing product: Descript’s Overdub

Descript have a couple of interesting points in their Ethics page regarding their voice cloning product Overdub (previously lyrebird.ai):

Content Authenticity Initiative

Will mostly leave this for future reading, but CAI essentially aims to provide attribution verification which is similar to the watermarking / NFT / digital signatures threads above but also addresses the “Certificate Authority” problem.

Unfortunately the CAI only seems to support image files for attribution verification so far. But seems like at least one step in the good direction with Descript being a member (or it could just Descript signing up for the kudos, who knows).

Registering content with CAI can be done anonymously and it might be possible to bake the registration into the tts-server application by default with an option to disable content registration (I have no idea how their API actually works, but the website says it can be submitted anonymously).

If it’s turned on by default, most generated speech created with tts-server would be registered but I appreciate there will probably concern regarding “opt out” things like this in FOSS.

Using the adversaries list above, Script Kiddies would be mitigated here (assuming they know nothing about command line arguments) but most Developers (+all ML Developers) would be able to disable it.

Verbal Consent Verification

Note: this assumes that YourTTS + Speaking in Tongues demo is slated to end up as part of the tts-server application and I’m not 100% sure if it is (I just pip installed and it wasn’t the first thing on my screen using the README instructions).

Without signing up for an Overdub account to check how verbal consent verification works, it reads like they initially pass the training audio to a STT model to verify that a specific transcription exists in the recording. If the transcription doesn’t include the required phrase(s) then a TTS voice is not generated (likely a recording of someone else).

This would be somewhat effective at mitigating against some simple replay attack examples which aim to make:

  1. David Bowie say things about the recent transition of his estate to Warner Chappel (David Bowie is deceased)
  2. some celebrity say kinky things about another celebrity
  3. my boss’ bank think he’s telling them to send $5000 to my bank account

For coqui specifically, it could be possible to implement a consent verification scheme as part of tts-server application where the user must say 5 randomly generated keywords at some point during their recorded audio. coqui already has the STT models to perform this verification. This would probably require something like downloading the STT models and changing the server API changes for tts-server.

This could be expanded on by randomly rotating when the user must speak the keywords and only providing a prompt on the tts-server front end with a N second count down timer. Or, even more simply, require the user to record themselves speaking completely randomly generated transcripts (with an equal mix of random words and complete sentences).

Using the adversaries list above, Script Kiddies would be mitigated here but some Developers (+all ML Developers) would probably work out a way to disable within tts-server.

Categories
notes

aws lightsail – wordpress site

Raison D’Etre

I used to use GitHub pages to host a very minimal (only HTML and CSS) personal webpage. But I’ve always struggled to keep track of interesting links and/or music, so have consistently been on the lookout for a suitable application / methodology to keep track of such things.

Turns out an application/methodology already exists — blogging and/or posts on personal websites.

Whilst the majority of this site is predominately geared towards acting as a “central hub” for career type of things etc., the page://posts section is basically anything and anything I want to keep a note of.

And it means I can do that without the mental load of “wait, did I store this on my workstation machine in Linux? Or the Windows partition? Did I use the bookmark script I wrote 2 years ago? Or did I write it to the yacy directory full of text files? Is it one of the tabs in Firefox on my Android phone or in Safari on my iPod?”

Below is basically my reference guide in case I ever forget how I set this up. This isn’t meant to be helpful for other people — it’s for when I completely forget how I put this together and can’t find the links below (once again).

Setting Up

I basically used the following guides to get this working pretty quickly and easily for circa £2.50 pcm (which is a lot cheaper than spending 20 hours pcm searching around for that thing I once saw somewhere on some device at some point).

A few notes below for differences with my set up from the Launch guide above:

  • Because I already have an AWS Route 53 hosted domain for dijksterhuis.co.uk the name servers step [6.c] can be ignored in the “Launch” guide (they’re already set up for AWS services)
  • Instead of pointing at the apex domain in step [6.d] I use an A record for wordpress.dijksterhuis.co.uk pointed to the static instance IP (this is done in the Lighstail UI)
  • Back in normal AWS Route 53 I set up another A record for @.dijksterhuis.co.uk pointing at the same static IP address.
  • Then I can set up CNAME records in AWS Route 53 for both www.dijksterhuis.co.uk and whoami.dijksterhuis.co.uk (the old GitHub pages subdomain) pointing at wordpress.dijksterhuis.co.uk

Now I can manage sub domains within Route 53, meaning I can keep my sub-domains that point at servers outside of AWS without having to migrate everything over to Lightsail.

For the certificates guide, the changes are to run bncert-tool with:

sudo /opt/bitnami/bncert-tool --domains dijksterhuis.co.uk,www.dijksterhuis.co.uk,wordpress.dijksterhuis.co.uk,whoami.dijksterhuis.co.uk

Then choose No when it asks about non-www to www redirections.

Then it’s a simple case of run the wordpress import wizard, set up the plugins correctly, make changes to the site template and then bob’s your auntie.