1. Clipping

I should document my approach to clipping. Hard clipping is not the nicest, it's a good idea to allow for some headroom and have a rather smooth damping function. Examples are for 16bit integer samples. We think in terms of the absolute value.
Also, this is supposed to be configurable (runtime or build-time) ... I might prefer hard clipping to avoid the damping of the signal when it's not needed.

Oh, that's a possibility, of course... scan every buffer to see if clipping is actually needed. That would preserve values in the range.
That parsing could also trigger a message to the client, so that it knows that something is too loud.

So we define:
m	maximum value of samples (32767)
w	width of the damping area, for example 700 for starting damping at 32067 of 32767
t	for the smallest value that actually gets mapped to m, values above this are cut down to m without any smoothness.

The kind of smoothness, well, it can be a linearily increasing damping, the value t being reduced to m, so the damping is t-m, the value m-w being mapped to m-w.
that would yield a function like

c = m-w + (t-m)/w *(v-w)
  = m - w - (t-m) + (t-m)/w * v
  = 2m - w - t + (t-m)/w * v

... with v >= m-w and v <= t

I suspect I need to add a safety measure for overshoot? Not sure.
What does the clipping derived from mixplayd look like, in these terms?

m -> m-1 (a minor detail at this stage of my musings)
CLIP_A = (m-(m-w)) ** 2 = w**2
CLIP_B = m-2*(m-w) = m-2m+2w = 2w - m < 0
c = m - (CLIP_A/(CLIP_B+v))
  = m - w**2/(2w-m+v)
  = m - w**2/(2w+(v-m))

... with v > m-w

and the values v < -(m-w) = w-m

c = -m + w**2/(2w-(v+m)))

Well, it seems that the original form still is the best...

d = 2w-m; the denomiator offset

c =  m - w**2/(d + v)   for  v >   m-w
c = -m + w**2/(d - v)   for  v < -(m-w)

That would be something like my next guess: A function that approaches the limit at infinity. But should it be that thick? 32767 is mapped to about 32000. Of course, one can make the damping width smaller.
Or is there a function that I like more? Well, that one looks rather logical and it's tested. I'll use that, but with tunable width.

Of course, one could read up on compressors and limiters and use some standard approach. But then, this here is the last resort to avoid ugly ticking noise. The upcoming filter chain of dermixd, for inputs and outputs (see, how I need a separate set of threads for the outputs, for the processing?) will let you put in a proper compressor or limiter, rather, and that'll be it. You can turn off the internal clipping, then.

Footnote about integer samples... do we have 2^(n-1) loudness levels or 2^(n-1)-1 ? Since zero level doesn't really matter... constant value is silence, anywhere.
Heck, am I going to do separate clipping for positive and negative values? Well, one can do that, no biggie. Well, I guess I should do it... since there is not much point in limiting the dynamic range, even by that teensy little bit.
But then, coming from floating point, is there a point in adding that teensy little bit? Bah, there's a fixed factor, (2^(n-1)-1) and that's it.

Footnote about hard clipping and integer samples: I did it now in a way that untilizes the one extra step of headroom at the bottom. That's mainly a thing to keep input data intact that has 16 bit samples that are fully filled. That same doesn't really work out for 32 bit samples, where the precision of float mandates that the dynamic range is a bit compressed.

2. Scaling / conversion between sample encodings

I fix DerMixD to 32 bit float (or in future integer) precision internally. On entry, integer samples have to be converted -- I don't really expect double precision input. The float samples are normalized to [-1:1] (that's a constant, AUDIO_SCALE, in the code that might change ... for 32 bit integer, for example).
I have a factor to scale the input samples: Division by the ratio of internal scale and external scale.

scale	the internal scale (AUDIO_SCALE, 1.0 in case of float)
iscale	the scale of input samples (usually some integer, so the maximum possible value)
oscale	the output scale (max value of output samples)
i	input sample
m	mixer sample
o	output sample
iratio	factor applied when converting input to mixer
oratio	factor applied when converting mixer to output
l	clipping limit in mixer format

I want the scaling to ideally leave the initial input sample unchanged if the input and output formats are the same. That works for samples in less (or matching precision) than the mixer format, but there is a compromise with 32 bit integer, which has a different kind of precision. There, the big aim is to prevent clipping in the end, 1to1 translation is not possible.
Though it would be nice to keep the overall dynamic range, this does not work with my approach to clipping, hard clipping, even. There, I need to represent the maximum possible amplitude as float value that is not larger than the larges 32 bit integer -- and that means the maximum amplitude will be _less_ than that by a margin.

What I should ensure, though, is that I do not introduce an additional damping with the sample conversion as such. But then, you know it better, pal: You cannot easily avoid numerical roundoff errors, and the accumulation of which. For the clipping, I can work with the error to ensure that integer wrapping does not occur. I have to be fine with that for now.
