README for DerMixD v2; date: 2014-06-15
also see web page at http://dermixd.de
*******************************

	Copyright (C) 2004-2014 Thomas Orgis <thomas@orgis.org> and others
	See AUTHORS file for contributors.

	This program is free software; you can redistribute it and/or modify
	it under the terms of the GNU General Public License version 2 as
	published by the Free Software Foundation,

	This program is distributed in the hope that it will be useful,
	but WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
	GNU General Public License for more details.

	You should have received a copy of the GNU General Public License
	along with this program; if not, write to the Free Software
	Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

For building and install notes see INSTALL. For the impatient:
	make && make install

See CHANGES for main, user-concerning changes.

***************************

UPDATE THIS for v2!!!

So, this is DerMixD, an audio mixing and network-listening daemon in the tradition of mixplayd (http://mixplayd.sourceforge.net) but done quite differently. It does basically what mixplayd does:

- play music files on a configurable number of input channels
- mix them together
- be controlled over a tcp or rather UNIX domain socket locally

But it does not do fading all of its own - you are supposed to define the fading volume and equalizer ramps via script commands (that's actually a _good_ thing!).
There's no explicit support for named pipes as output, but thinking about it, there is no special support necessary.
You do mkfifo, open that named pipe and let dermixd write raw audio to that filename. It doesn't need to _know_ that there is a named pipe involved (UNIX ftw!;-).

I wrote this mostly from scratch with some flexibility in mind using multithreading (what doesn't matter in terms of flexibility, but what matters nevertheless) and C++ with general input and output classes allowing to implement special classes for different audio formats or devices on both input and output side - hopefully - quite easily.
Concernig the mp3 decoder input, DerMixD started out as a complex frontend to mpg123's remote control interface. This resulted in me also getting involved in mpg123 development and finally creating libmpg123 as a library to use the efficient MPEG decoder code in other programs. Also, DerMixD features a generic prebuffer now that serves to remove leading/trailing silence --- this was a major issue back then when I did not yet have added gapless playback support to mpg123, but still is useful for actual silence you want to skip.

The mixing itself is rather flexible. Every input channel gives a fixed sound but this can be directed to output devices as you wish. For example this allows for having a master mix output and some other outputs for cueing or just partial mixes. You have arbitrary n-to-m relationships between input and output channels.

Additionally the multithreading makes _much_ sense concerning the network code: There is a thread for the main server watching the port and for every client connected. I really think that this is more appropriate than querying all sockets and the port in a loop from time to time (though I must admit that I didn't measure performance differences on that part... if you don't have many connections, the simple loop of mixplayd may still work well). I did focus on ensuring playback continuity also when the system is under load. On Linux, there is some playing with thread priorities to help ensuring that the importand output playback threads are indeed given priority. Taks are separated into a multitude of threads to have the critical path free of unnecessary hindrance, in addition to output buffering.

If you really don't care at all about the stuff that made me code this, then feel free to consider the "original" mixplayd. On the outside it should behave quite similar and may be just what you searched and its a smaller static binary. One part of the gapless story (called "autocue") already went into mixplayd, too (version 0.60). But bear in mind: I have this thingy also on the end to catch mp3 frame / audio cd sector padding, not to forget what the "follow" command does.

In the end, I don't want to bash mixplayd. It inspired me to waste my study time with writing this ah-so-flexible mixing engine, and here it is;-)


Starting the daemon
===================

See `dermixd -h` for a list of command-line parameters and default values (note: in version 2.0, a UNIX domain socket is active by default and I am considering command access restriction to TCP).


The control interface
=====================

The control interface was inspired by that of mixplayd, but collected quite some difference over time.

The communication consists of text sent through a socket or via standard input when starting with daemon=n. The most easy way to open such a connection on your very own machine is

telnet localhost 8888

(8888 is the default TCP port that mixplayd and dermixd use)

Then you type "something" and dermixd does something, giving a response (normally preceded by [something]) after that.
DerMixD uses UNIX line end for responses and accepts command lines ended by UNIX, DOS or MAC convention (line stopped by first \r or \n).

Specification of a command is the name of the command followed by its parameters -- in order -- divided by spaces. Commands accepting filenames/urls that may contain spaces itself take this as last parameter, so that with the fixed number of parameters there is no need for quotes or the like.

A command generally returns

[<command>] success

or

[<command>] success: <value>

when some single value was set (seeking counts as setting the position in seconds)

Also, there are commands that return some information in (possibly) multiple lines:

[<command>] +begin
first line
second line
...
[<command>] -end

An error is indicated by

[<command>] error: <some possible explanation / hint about what and why>

A client should be flexible with the whitespace, be it the number of which or the type (spaces, tabs), and not insist on a ":" after error/success . For example

[load]error
[load]         					  error:

Should be interpreted to mean the same thing. I'm not saying that dermixd will produce such extreme responses, but some small change in spacing for readability should not break existing parsers.

The actual form of <command> depends bit on what was issued. Normally it is the command name you used in the request, but special cases are script and say:

script 1 32 bass 1 0
[script/bass] success

The response of a script contains the scripted subaction name, if the _initial_ parsing was successful.

script 1 32 laod 
[script/load] error: parsing of subaction failed <- unknown command

 Otherwise:

script asf 
[script] error: unable to get (all) needed arguments

Then, the say command gives a 

[say] success

immediately and then

[saying] <the stuff it should say>

(You may say that that is without sense, but it is not when you use it in a script as notification.)

You can get the full listing of commands from `dermixd api=yes` (this here might be a bit out of date):

Full API listing:

<name> parm1(type) parm2(type) ...; [ flags ]:	<description>
meaning of flags:
	container -- this action can contain a subaction (given as parameter); read: this is the script command
	nosub -- this action cannot be a subaction in a container; read: not allowed in script
	optarg -- parameters are optional: either specify all or none
	optone -- all but first parameter are optional: specify any number of them
	in -- action on input channel
	out -- action on output channel

volume channel(size-int) volume(left/both)(float) volume(right)(float); [ optone in ]:	set inchannel volume factor(s)
preamp channel(size-int) dB(float); [ in ]:	set inchannel preamplification
bass channel(size-int) value(float); [ in ]:	set inchannel bass eq factor
mid channel(size-int) value(float); [ in ]:	set inchannel mid eq factor
treble channel(size-int) value(float); [ in ]:	set inchannel treble eq factor
eq channel(size-int) bass(float) mid(float) treble(float); [ in ]:	set all three eq factors
pause channel(size-int); [ in ]:	pause inchannel
speed channel(size-int) value(float); [ in ]:	set playback speed factor (1=normal)
pitch channel(size-int) value(float); [ in ]:	increase/decrease playback speed factor
start channel(size-int); [ in ]:	start playback on inchannel
seek channel(size-int) position in seconds(float); [ in ]:	absolute seek on inchannel
rseek channel(size-int) offset in seconds(float); [ in ]:	relative (from current position) seek on inchannel
bind inchannel(size-int) outchannel(size-int); [ ]:	bind input channel to output channel
unbind inchannel(size-int) outchannel(size-int); [ ]:	release bond between input channel and output channel
outeject channel(size-int); [ out ]:	Un-load resource on outchannel (eject the tape).
outstart channel(size-int); [ out ]:	start playback on outchannel
script channel(size-int) time in seconds(float) command(string); [ container nosub in ]:	script an action triggered by an inchannel passing certain time/position, command is just any valid (and allowed) command with arguments
nscript channel(size-int) n(integer) time in seconds(float) command(string); [ container nosub in ]:	script an action triggered by an inchannel passing certain time/position, script will be executed n times (one for each triggering)
showscript channel(size-int); [ nosub in ]:	show the script commands for programmed actions on an inchannel
delscript channel(size-int); [ in ]:	remove scripting actions from inchannel
follow leader(size-int) follower(size-int); [ ]:	let one inchannel follow another (start folower gaplessy after leader stopped)
nofollow leader(size-int); [ in ]:	release the followership on a leader
fullstat; [ nosub ]:	give full status info on all in- and outchannels (several lines)
say the message(string); [ ]:	just say something (I put [saying] in front); useful when timed somehow
watch channel(size-int); [ in ]:	watch inchannel (get messages on events and ongoing playback)
unwatch channel(size-int); [ in ]:	stop watching inchannel (see watch)
load channel(size-int) track(string); [ nosub in ]:	load track on inchannel (be prepared for starting it)
inload channel(size-int) driver(string) track(string); [ nosub in ]:	load track on inchannel (be prepared for starting it)
ineject channel(size-int); [ nosub in ]:	Eject track on input channel (stop playback, release resources).
eject channel(size-int); [ nosub in ]:	Alias for ineject.
scan channel(size-int) list of properties(string); [ nosub in ]:	scan input properties
effect channel(size-int) effect name(string) position (0 = end of chain)(size-int); [ nosub in ]:	add an effect to an input channel's effect chain
effect-remove channel(size-int) position (0 = end of chain)(size-int); [ nosub in ]:	remove a effect from an input channel's effect chain
effect-bypass channel(size-int) position (0 = end of chain)(size-int) bypass value (0/1 for false/true)(integer); [ nosub in ]:	enable/disable bypassing of a effet (so, disable/enable the effect;-)
effect-param channel(size-int) position (0 = end of chain)(size-int) parameter string (format to be decided, maybe depending on effect) TODO: make the parameter string optional to give current effect settings(string); [ in ]:	change input effect parameters
effect-list channel(size-int); [ in ]:	show the effect chain of given input channel
effect-query name(string); [ nosub optarg ]:	query list of available audio effects, or information about a specific one
effect-help channel(size-int) position (0 = end of chain)(size-int); [ in ]:	help info for certain input effect
outload channel(size-int) driver(string) resource(string); [ nosub out ]:	load resource on outchannel with specified driver/device (active right away), if driver == default, the resource is ignored, just the internal defaults used
length channel(size-int); [ in ]:	determine exact length of track (stops channel, decodes track till end, seeks back to where it left off)
preread file(string); [ nosub ]:	read through a file once (to have it cached by file system/kernel)
addin nick name(string); [ nosub optarg ]:	add input channel (with optional nick name)
remin id(size-int); [ nosub ]:	remove input channel
addout nick name(string); [ nosub optarg ]:	add output channel (with optional nick name)
remout id(size-int); [ nosub ]:	remove output channel
id; [ nosub ]:	print my full identification
close; [ ]:	close current connection
fadeout; [ nosub optarg ]:	dummy to remind you to do custom fading via script actions
sleep; [ ]:	let me sleep until any client wants something
shutdown; [ ]:	lay down and die gracefully
help the command(string); [ optarg ]:	give some info / usage pattern on a command
showapi; [ nosub ]:	show the whole API (list help for all commands)
threadstat; [ nosub ]:	list spawned threads
ls dir/file(string); [ nosub optarg ]:	list file/directory
cd dir(string); [ nosub ]:	change directory (for client actions)
pwd; [ nosub ]:	print current working directory
spy; [ nosub ]:	spy on client communication
unspy; [ nosub ]:	stop spying on client communication
showid 1(on) or 0(off)(integer); [ nosub ]:	toggle showing of channel id in responses to channel commands
peer your peer name(string) peer's peer name(string) message(string); [ nosub ]:	send a message to another peer (client).
addpeer peer name(string) description(string); [ nosub ]:	add a peer entry, registering for messages
rempeer peer name(string); [ nosub ]:	remove a peer entry
showpeers; [ nosub ]:	show a list of peers with optional description
feedback value (0/1)(integer); [ nosub ]:	disable/enable waiting for mixer feedback before submitting the next action (applies only to actions that do fine without feedback)

Scripting
=========

There are some special commands:

	script <ch>	<time> <command> <parameters>
	nscript <ch> <n> <time> <command> <parameters>

	showscript <ch>

	clearscript <ch>

These are used to manage a list of commands each to be executed when input channel <ch> reaches position <time> (see descriptions above for individual function).
Negative times have the special meaning of execution right after track end; the execution order of several scripts with negative times is according to the absolute value: The bigger the absolute value, the closer to the end (the most negative time is the first one).
Of course you can manipulate a different channel than the one giving the time; in fact, you're free to use all commands except the ones marked with the nosub flag in the full listing above.
DerMixD doesn't do fadeout on its own as mixplayd does - you have to draw the line somewhere... instead you can program your (cross)fading curve like

volume 1 0
script 0 40 start 1
script 1 0 volume 1 0.1
script 1 0.1 volume 1 0.2
script 1 0.2 volume 1 0.3
script 1 0.3 volume 1 0.4
script 1 0.4 volume 1 0.5
script 1 0.5 volume 1 0.6
script 1 0.6 volume 1 0.7
script 1 0.7 volume 1 0.8
script 1 0.8 volume 1 0.9
script 1 0.9 volume 1 1
script 1 1 volume 0 0.9
script 1 1.1 volume 0 0.7
script 1 1.2 volume 0 0.4
script 1 1.5 volume 0 0.2
script 1 1.8 volume 0 0.1
script 1 2 volume 0 0
script 1 2 stop 0

You have the power - use it!

The time argument has a special meaning when being smaller than zero: Then it means "do it after the track ended".
Also, smaller negative values put the action in front of others with bigger negative time values (in the mathematical sense of smaller and bigger):

script 0 -1 say you!
script 0 -2 say Hello 

will result in "Hello " being said first, then "you!".
The actions will all happen in the same time frame, but in the order specified by their begative times.


Available input devices
=======================

mpg123 - play mpeg audio files/urls (mp3, mp2) through mpg123 decoder
sndfile - play any file libsndfile can handle
vorbisfile - OGG/Vorbis through libvorbisfile
sine - generate sine tone, load with url scheme sine://<freq> or sine://<freq>@<sampling rate>
raw_s16 - raw stereo audio files, signed short (assumed to be 16 bits on your box! that should be done more straight in future), host byte order
dummy - could be called silence... because that is what it does, also good as basic template

The sndfile and vorbisfile inputs have to be enabled in compilation stage (via make VORBISFILE=yes / make SNDFILE=yes).

The raw input can be given parameters via the inload (and inplay) commands:

	inload 0 raw:1ch:48000Hz file.raw

Will play a mono file with 48000Hz appropriately.
Without any parameters the files are assumed to match the main mixer settings of mono/stereo and sample rate.


Available output devices
========================

There are some in various states of usefulness (stable not meaning rock-solid and foolproof but generally working without problems for me).

dermixd name (for outload)  description                                         state
oss                         audio hardware output via OSS API,                  stable 
                            e.g. /dev/dsp; works with Alsa OSS emulation, too   
alsa                        audio hardware output via Alsa API                  stable (better latency than OSS)
mme                         audio hardware output via MME API of Tru64 Unix     semi-stable**
                            (written for a Compaq XP1000)
text                        write ASCII text (TAB-separated numbers) to STDOUT  stable
raw_s16                     write RAW Signed 16Bit data to file                 stable

** not able to be _removed_ properly; I only tested this on a XP1000 box that plays fine except for some outage once or twice every minute. Could be some system service stopping the mmeserver, dunno.
One could use OSS on Tru64, though - perhaps that's better. And: I fact I didn't test that for quite some time now ... the code is subject to bitrot until I get the Alpha box back into operation, if ever.

Version Policy
==============

I use a version scheme with three parts:

<major>.<minor>.<bugfix>

The major number stands for the basic feature set and interface. All 1.x versions shall (in a perfect world...) be compatible to any 1.y with y < x. Minor versions may add but not remove features and shall (...) not change default behaviour. This does not necessarily include the internal API for input/output modules since it has still to prove and possibly improve itself to actually host some more of them.

The last number is just for bugfixes that don't affect the nominal features (but may repair broken ones).

I don't discriminate even and odd numbers for stable/development releases or the like. If m > n, m should be better.

A client gets the interface version on connection init:

	shell$ telnet localhost 8888
	Trying 127.0.0.1...
	Connected to localhost.
	Escape character is '^]'.
	[connect] DerMixD v1.0

The full version specification including bugfix number and possible suffixes (-dev, -test2,...) is available via the id command:

	id
	[id] DerMixD v1.0.0

So, any client program that wants to use ensure that some functionality introduced in version 1.3 is available should parse the initial [connect] string and check for a minor version >= 3.

Major releases are allowed to change things in an incompatible manner if it makes sense. After all, you learn some things over the years.
