Dec. 10th, 2013

hirez: (Armalite rifle)
It's the works all-hands tomorrow.

I say 'all-hands' because we collectively seem to understand that sort of US/MBA originated quasi-military managerialist shitery better than we understand the other terms for what $employer calls 'Forum'. It may as well be called Carousel for all the joy it brings the greyer of beard in the orifice.

Mind, when I started it was all happy and clappy and the bloke from the lottery (probably. I dunno. Maybe it was the chap from that Cilla Black programme with the people on it that did the things. Maybe it was the thing where Forsythe operates those two northern chaps like glove-puppets while Fred Trueman swills beer and provides surly commentary. You'll not see owt like that t'other side of t'Pennines.) doing the voiceover and shit disco lights bouncing around to shit eurotechno. I used to wish for some boggle-eyed sort to bung on some Stakker so we could see the suits 'ave it large and live on the stage in front of us before the ritual sacrifice of someone younger and fitter in order to propitiate the spirits before sweeping the blood away to Beltramesque Alpha-Juno noises and some tasty breakbeat while they gnawed on the bones of the fallen. There'd be a parade of blinking and grinning young people up for awards for, er, doing their fecking jobs and regular offhand comments from the CEO downwards about those scruffy buggers in IT who did nothing to help the bottom line but never mind just look at them, look at them with their black t-shirts and their anger and their ideas, it is probably a kindness that we keep them off the street and if we gave them money they would only waste it on servers, rather than wide shirts and ornithological shoes like the sensible people who have won things.

Then there would be a funfair and a disco and free beer and a punch-up.

These days a bloke in a suit leaves out the red bits from the second annual report and those who can still stand the excitement beetle down to the bar below the office for one free pint before catching the bus home. I understand the exploits of the scruffy buggers in IT will loom large and no doubt one of us will be picked for sacrifice since there are no more fit and young people and yet the spirits are still uneasy.

So anyway. You know how there used to be occasional posts from people along the lines of 'Mails never sent'?

I sent this one:


Subject: What's wrong with black-boxes
To: $people


So we have a section on the wiki called 'on-call resources'. It's a
set of useful (and sometimes less so) troubleshooting pointers for
when things go wrong at 3AM.

You will notice the extent of the $subsystem troubleshooting documentation.
(NB: _sarcasm_)

This isn't really a new problem. It has been $employer business as usual
to buy/develop a subsystem/website and point Ops at it with little in
the way of warning or training since I've been working here.

However, I've never been a great one for tradition, so I shall call it
by its proper name, which is 'You're having a right laugh, you are.'

The absolutely minimal least-effort first step toward fixing this
would be for the dev-team to provide a theory of operation for the
subsystem in question, preferably with a list of problems encountered
during load-testing and how the troubleshooting/mitigation for those
was achieved.

A significant move in the right direction would be for the various
elements in the chain to be able to distinguish success from failure
and report on that failure in a scalable and human-readable way. This
would likely require the sort of defensive coding one has come to
expect from mature subsystems/products - postfix for instance.

The two models I have in my head are indeed Postfix, from the software
viewpoint, and the Philips KT3 television chassis as a general
approach.

Postfix:

For those not quite as au fait with large-scale mailsystems as self,
Postfix is one of the more useful and reliable mail-transports
currently available. A thing you can do with postfix specifically, and
MTAs in general, is work out the path a given message took across the
internet by inspecting the list of headers in that message. Every
(RFC-2822) mail message has a globally unique ID. Meanwhile, each
postfix instance will also give that message a
unique-enough-to-be-useful ID while it transits that particular
system. This gives the interested admin the ability to accurately
track the message across all the systems to which they have access.

We've borrowed that concept wholesale for the way Eventbot works.

In the specific case of $subsystem, and indeed any other $employer system where
messages (or other object types) are processed across a number of
systems, it would be Really Quite Useful (aka 'Vital') to be able to
track the progress of a message/object either for the purposes of
troubleshooting or audit.

Since one is currently unable to do that, the only action that can
usefully be taken is 'Have you tried turning it off and on again?'

Which, no.


The Philips KT3:

The KT3 was a popular (CRT) television chassis from the 70s/80s, and
was badged under a myriad names. The technical documentation came in
an A4 binder circa three inches thick.

Right at the back were the circuit diagrams, which folded out quite
large and usually came with scorch marks from dropped soldering irons
or blobs of solder from someone re-making the line-output subsection.

At the front was a general introduction to the Philips organisation
and a quick refresher on the principles of analogue television
transmission.

The thick section was a _complete_ theory of operation for all the
relevant panels - PSU, tuner, IF strip, Line-output &c.

After that you found the troubleshooting guide - a list of common
faults and solutions, and if that didn't cover your problem you'd find
the list of test-points across the various panels with the expected
voltages and/or waveforms for the usual selection of broadcast
test-cards and/or bench test-gear.

It was the single most useful item of technical documentation I have
had the pleasure to discover.

Acorn produced something similarly helpful for the BBC micro, and the
original IBM PC (& AT) technical documentation set were nearly as
good.

There's no particular reason _not_ to attempt something that good,
although obviously without a paid team of technical writers, you've no
chance.

Or continue to treat the Ops team like bastards. Whatevs.


--
Julia Hawkes-Reed. Unix admin. $employer. x2526

May 2025

S M T W T F S
    123
45678910
11121314151617
18192021222324
2526272829 3031

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 22nd, 2026 10:21 am
Powered by Dreamwidth Studios