https://soundcloud.com/loose-lips123/loose-lips-mix-series-367-john-sellekaers
Month: January 2022
not-overbridge
A very hacky method to get per track recordings out of the elektron octatrack into ableton live — https://emilsmith.pro/blog/octabridge-it-s-not-overbridge-for-the-octatrack-but-it-s-close-enough/
atlas – database schema migration cli
i’m sysadmin of some linux gpu servers in the phd working group and having to manually setup new users is a massive PITA.
a few options for replacements
- Auth0 Authentication
- https://github.com/jupyterhub/oauthenticator/blob/master/oauthenticator/auth0.py
- account required
- will it be able to communicate with servers behind firewalled school of computing network?
- KeyCloak OIDC / OAuth2 Authentication
- https://www.keycloak.org/
- functional+tested implementation up and running
- only problem is existing user data — the user’s UID inside the spawned container depends on the order in which users start logging in, rather than ownership of local files.
- could bind mount
/home/{user}
into container at/mnt/server
then change$NB_UID
for container user duringusermod
to match theUID
of the user’s/home/{user}/.bashrc
file — means another change to jupyterhub’sstart.sh
script in the Docker images. - Could also symlink
/mnt/server
to/home/{user}/server
at end ofstart.sh
with everything else in/home/{user}
mounted in a docker volume? although docker volumes are NOT stored on the SSD for some servers because of disk space issue so user data wouldn’t be on the SSD :[ - Better idea would probably be mount
/home/{user}
as before and change$NB_UID
of$NB_USER
to the uid of.bashrc
(withoutchown
-ing user files) - If no
/home/{user}
exists… then…. what happens? JupyterHub server does create the user account… so should be a simple case of look at uid of.bashrc
… see it’s the same as$NB_UID
… do nothing…? - BUT … JupyterHub will only create the
/home/{user}
directory inside its container (if using the completely containerised version) … which means that/home
would still need to be mounted in thejupyterhub-{type}
containers…. This will probably cause an absolute mess for user data though there will be multiple users that have files withUID=1000
etc. And because students can still SSH onto the servers it means they might be able to access other user’s data… :[
- Native Authenticator
- https://github.com/jupyterhub/nativeauthenticator
- functional+tested implementation up and running
- Same problems as keycloak, same potential “solution”, i.e. hacky workaround.
https://lists.cd.foundation/g/sig-mlops
This is a public list for the CDF MLOps SIG. All meetings and discussions are held in the open, and everyone is welcome to join. The current membership, calendar, and meeting documents can be found at https://github.com/cdfoundation/sig-mlops.
the Sig-MLOps 2021 roadmap speaks volumes to me and my experience as a machine learning phd student…
At this point in the development of the practice, it perhaps helps to understand that much of ML and AI research and development activity has been driven by Data Science rather than Computer Science teams. This specialisation has enabled great leaps in the ML field but at the same time means that a significant proportion of ML practitioners have never been exposed to the lessons of the past seventy years of managing software assets in commercial environments.
As we shall see, this can result in large conceptual gaps between what is involved in creating a viable proof of concept of a trained ML model on a Data Scientist’s laptop vs what it subsequently takes to be able to safely transition that asset into a commercial product in production environments. It is not therefore unfair to describe the current state of MLOps in 2020 as still on the early path towards maturity and to consider that much of the early challenge for adoption will be one of education and communication rather than purely technical refinements to tooling.
most other phd/professor type folks were not the least bit interested in dealing with these sorts of problems during development — basic codebase documentation often seems to be considered a waste of time based on the hundreds of github repos I’ve looked over.
it’s likely these are a subset of SEP fields and i’ve settled on calling the phenomena “Someone Else Will Clean Up the Mess” fields (or SEWCUM fields for short).
emergency alert systems
apparently there’s also a part 2….
david nutt – ecstasy vs. horse riding
ekin fil – feelings
hoavi – invarient
https://github.com/dwmkerr/hacker-laws
the robustness principle is pretty apt for adversarial example research….
https://github.com/dwmkerr/hacker-laws#the-robustness-principle-postels-law
macv-sog quotes
i never completed rainbow six 3
it was hard for a 14 year old brain who wanted to shooty shoot half life aliens
https://store.steampowered.com/app/19830/Tom_Clancys_Rainbow_Six_3_Gold/
black mesa is genius
SoupEmporium’s in depth exploration of Black Mesa ticks all the right boxes
films (to watch)
The Andromeda Strain [watched] — A team of top scientists work feverishly in a secret, state-of-the-art laboratory to discover what has killed the citizens of a small town and learn how this deadly contagion can be stopped.
https://www.imdb.com/title/tt0066769/
the musical score by Gil Mellé for this is amazing (sounds like an APR 2600 score?)
https://www.youtube.com/watch?v=j-R0kuHyBb8
Silent Running — In a future where all flora is extinct on Earth, an astronaut is given orders to destroy the last of Earth’s botany, kept in a greenhouse aboard a spacecraft.
https://www.imdb.com/title/tt0067756/
Three Billboards Outside Ebbing, Missouri — A mother personally challenges the local authorities to solve her daughter’s murder when they fail to catch the culprit.
https://www.imdb.com/title/tt5027774/
Choke — A sex-addicted con-man pays for his mother’s hospital bills by playing on the sympathies of those who rescue him from choking to death.
https://www.imdb.com/title/tt1024715/
Vice — The story of Dick Cheney, an unassuming bureaucratic Washington insider, who quietly wielded immense power as Vice President to George W. Bush, reshaping the country and the globe in ways that we still feel today.
https://www.imdb.com/title/tt6266538/
A Boy and His Dog — A young man and his telepathic dog wander a post-apocalyptic wasteland.
https://www.imdb.com/title/tt0072730/
The Quiet Earth — A man named Zac Hobson awakens to find himself alone in the world. In a desperate attempt to search for others, he finds only two who have their own agenda.
https://www.imdb.com/title/tt0089869/
THX 1138 — In the 25th century, a time when people have designations instead of names, a man, THX 1138, and a woman, LUH 3417, rebel against their rigidly-controlled society.
https://www.imdb.com/title/tt0066434/
Rollerball — In a corporate-controlled future, an ultra-violent sport known as Rollerball represents the world, and one of its powerful athletes is out to defy those who want him out of the game.
https://www.imdb.com/title/tt0073631/
Soylent Green — A nightmarish futuristic fantasy about the controlling power of big corporations and an innocent cop who stumbles on the truth.
tds – ml orchestration for startups
linux find
Write this up as a response to someone on hackernews but didn’t submit it. Dumping here for reference…
search for exact pattern in all basenames
find . -name "<pattern>"
search for pattern at the start of basenames
find . -name "<pattern>*"
search for pattern at the end of basenames
find . -name "*<pattern>"
search for pattern anywhere in the basename
find . -name "*<pattern>*"
search for multiple patterns anywhere in the basename
find . -name "*<p1>*<p2>*"
path based patterns (same wildcards as above)
find . -path "*<p1>/*/<p2>*"
directories using basenames
find . -type d -name -f "*<pattern>*"
non directories using basenames (technically directories are files too!)
find . -type f -name -f "*<pattern>*"
only recurse 3 directories deep
find . -maxdepth 3 -name "*<pattern>*"
start 2 directories deep
find . -maxdepth 2 -name "*<pattern>*"
search only in directories 2 levels deep
find . -maxdepth 2 -mindepth 2 -name "*<pattern>*"
execute a command on each result (force remove files)
find . -type f -name "*<pattern>*" -exec rm -f {} \;
execute two commands in order on each result
find . -type f -name "*<pattern>*" -exec cp -f {} ./archive/ \; -exec rm -f {} \;
use find results with GNU parallel to speed up smaller tasks
(so long as they’re safe in parallel that is)
find . -name "*<pattern>*" | parallel ./some-script.sh {}
original posted here:
https://github.com/coqui-ai/TTS/discussions/1036
I’ve spent most of my PhD figuring out ways to attack Mozilla DeepSpeech but my PhD supervisor and I spent some time discussing this topic for generative image models so I’m gonna chip in here…
I’ll heavily caveat:
- I’m not up to speed on TTS in the slightest (abusing properties of CTC has basically been most of my PhD)
- detecting / preventing deepfakes / generative model misuse is an active research problem (and will be for a long time to come)
I think that any technical solutions will be easily worked around, or someone will just reproduce the code, at some point it will be as trivial as running a jupyter notebook. The cat is out of the bag so to speak. What is needed are societal and legal approaches.
Like [above post] I don’t think there’s a technological fix here, and I agree that we need societal and legal measures.
I disagree that this is solely a legal / societal issue.
Unfortunately this is the cat and mouse game of the adaptive security cycle (see Biggio and Roli). Someone designs mitigations, someone breaks them, someone fixes them, someone breaks them… repeat ad infinitum.
No system is ever going to be 100% secure — or 100% unable to generate malicious data in the generative model case.
In security we aim to make it as hard as possible to perform feasible attacks instead of aiming for completely impossible. So the job is to make the “as trivial as running a jupyter notebook” case as unlikely as possible.
The real trick is to make a small change that has a big impact and I think coqui could do something to help in that regard by initially focussing on the tts-server
application… which would be my first port of call for doing nefarious things with TTS.
Script kiddies vs. ML developers vs. APTs
A bit of a break down of potential adversaries is always helpful when discussing things like this.
- Script Kiddies: These folks don’t particularly have much know how and are looking for the “run a jupyterhub script” approach.
- Developer: Know how to clone the repo and modify source code, but do not have resources to alter or retrain the model.
- ML Developer: Has the know how and resources to modify and completely retrain from scratch.
Script kiddies
Mitigations being enabled by default within tts-server
would mean script kiddies are no longer given the option to do nefarious things (those covered by mitigations at least) as they’re no longer a pip install TTS && tts-server ...
away from doing bad things.
As a simple example, if users want to turn off the mitigations? Then tts-server
adds an audible watermark to try and stop you from doing nefarious things. Want to remove the watermark? Turn the mitigations back on.
Script Kiddie mitigations are a pretty good starter for 10. A positive side effect is that anyone who deploys an unmitigated instance of tts-server
in the wild (or applications derived from the the coqui code) won’t be able to let any random user on the web generate speech without an audible watermark, and mitigated instances would (ideally) block the generation of malicious voice data requested by users.
Developer
The developer level adversaries are harder to mitigate against but the tts-server
modifications would require them to clone/modify the source code. The effort required might scale proportionally to the number of people willing to go through and actually find out how to modify the backend code for tts-server
and/or TTS. Chances are one of this set will be able to set up an unmitigated tts-server
in the wild by disabling the audible watermarking example above in the source code.
Depending on how far you want to go this you could start to mitigate against some of more determined attackers. Data and/or Model level mitigations could make it prohibitive in terms of cost and expertise to the more determined Developer level adversaries.
These developer level ones can be dealt with later on (if deemed worth the effort).
ML Developer
No much you can do here as the project is open source except make their life very difficult (but then you’ll be making everyone’s life difficult in the process.
An existing product: Descript’s Overdub
Descript have a couple of interesting points in their Ethics page regarding their voice cloning product Overdub (previously lyrebird.ai):
- Membership of the Content Authenticity Initiative
- verbal consent verification
Content Authenticity Initiative
Will mostly leave this for future reading, but CAI essentially aims to provide attribution verification which is similar to the watermarking / NFT / digital signatures threads above but also addresses the “Certificate Authority” problem.
Unfortunately the CAI only seems to support image files for attribution verification so far. But seems like at least one step in the good direction with Descript being a member (or it could just Descript signing up for the kudos, who knows).
Registering content with CAI can be done anonymously and it might be possible to bake the registration into the tts-server
application by default with an option to disable content registration (I have no idea how their API actually works, but the website says it can be submitted anonymously).
If it’s turned on by default, most generated speech created with tts-server
would be registered but I appreciate there will probably concern regarding “opt out” things like this in FOSS.
Using the adversaries list above, Script Kiddies would be mitigated here (assuming they know nothing about command line arguments) but most Developers (+all ML Developers) would be able to disable it.
Verbal Consent Verification
Note: this assumes that YourTTS + Speaking in Tongues demo is slated to end up as part of the tts-server
application and I’m not 100% sure if it is (I just pip install
ed and it wasn’t the first thing on my screen using the README instructions).
Without signing up for an Overdub account to check how verbal consent verification works, it reads like they initially pass the training audio to a STT model to verify that a specific transcription exists in the recording. If the transcription doesn’t include the required phrase(s) then a TTS voice is not generated (likely a recording of someone else).
This would be somewhat effective at mitigating against some simple replay attack examples which aim to make:
- David Bowie say things about the recent transition of his estate to Warner Chappel (David Bowie is deceased)
- some celebrity say kinky things about another celebrity
- my boss’ bank think he’s telling them to send $5000 to my bank account
For coqui specifically, it could be possible to implement a consent verification scheme as part of tts-server
application where the user must say 5 randomly generated keywords at some point during their recorded audio. coqui already has the STT models to perform this verification. This would probably require something like downloading the STT models and changing the server API changes for tts-server
.
This could be expanded on by randomly rotating when the user must speak the keywords and only providing a prompt on the tts-server
front end with a N
second count down timer. Or, even more simply, require the user to record themselves speaking completely randomly generated transcripts (with an equal mix of random words and complete sentences).
Using the adversaries list above, Script Kiddies would be mitigated here but some Developers (+all ML Developers) would probably work out a way to disable within tts-server
.
french tragedies in colour
boards of canada – basefree
autechre – warp tapes 89-93
via maris – lekky/hiblrr
Boomkat Link: https://boomkat.com/products/lekky-hiblrr
seefeel – succour (redux)
aws lightsail – wordpress site
Raison D’Etre
I used to use GitHub pages to host a very minimal (only HTML and CSS) personal webpage. But I’ve always struggled to keep track of interesting links and/or music, so have consistently been on the lookout for a suitable application / methodology to keep track of such things.
Turns out an application/methodology already exists — blogging and/or posts on personal websites.
Whilst the majority of this site is predominately geared towards acting as a “central hub” for career type of things etc., the page://posts section is basically anything and anything I want to keep a note of.
And it means I can do that without the mental load of “wait, did I store this on my workstation machine in Linux? Or the Windows partition? Did I use the bookmark
script I wrote 2 years ago? Or did I write it to the yacy
directory full of text files? Is it one of the tabs in Firefox on my Android phone or in Safari on my iPod?”
Below is basically my reference guide in case I ever forget how I set this up. This isn’t meant to be helpful for other people — it’s for when I completely forget how I put this together and can’t find the links below (once again).
Setting Up
I basically used the following guides to get this working pretty quickly and easily for circa £2.50 pcm (which is a lot cheaper than spending 20 hours pcm searching around for that thing I once saw somewhere on some device at some point).
A few notes below for differences with my set up from the Launch guide above:
- Because I already have an AWS Route 53 hosted domain for
dijksterhuis.co.uk
the name servers step [6.c] can be ignored in the “Launch” guide (they’re already set up for AWS services) - Instead of pointing at the apex domain in step [6.d] I use an
A
record forwordpress.dijksterhuis.co.uk
pointed to the static instance IP (this is done in the Lighstail UI) - Back in normal AWS Route 53 I set up another
A
record for@.dijksterhuis.co.uk
pointing at the same static IP address. - Then I can set up
CNAME
records in AWS Route 53 for bothwww.dijksterhuis.co.uk
andwhoami.dijksterhuis.co.uk
(the old GitHub pages subdomain) pointing atwordpress.dijksterhuis.co.uk
Now I can manage sub domains within Route 53, meaning I can keep my sub-domains that point at servers outside of AWS without having to migrate everything over to Lightsail.
For the certificates guide, the changes are to run bncert-tool
with:
sudo /opt/bitnami/bncert-tool --domains dijksterhuis.co.uk,www.dijksterhuis.co.uk,wordpress.dijksterhuis.co.uk,whoami.dijksterhuis.co.uk
Then choose No
when it asks about non-www
to www
redirections.
Then it’s a simple case of run the wordpress import wizard, set up the plugins correctly, make changes to the site template and then bob’s your auntie.