Explanation For SoX Filter Controls

TLDR:

  • 95.0% filter bandwidth (the default might be a little to high according to technical detail below)
  • 30,000 filter length
  • anti-aliasing 100.0%
  • filter phase: 0 (minimum phase to prevent pre-ringing)

Super happy up-sampling to 1411.2 KHz with a Denafrips XD.

From the man page for sox. First the “presets” just to set some baselines:

rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
Change the audio sampling rate (i.e. resample the audio) to any given RATE (even non-integer if this is supported by the output file format) using
a quality level defined as follows:

                                                         Quality   Band-  Rej dB   Typical Use
                                                                   width
                                                   -q     quick     n/a   ~=30 @   playback on
                                                                           Fs/4    ancient hardware
                                                   -l      low      80%    100     playback on old
                                                                                   hardware
                                                   -m    medium     95%    100     audio playback
                                                   -h     high      95%    125     16-bit mastering
                                                                                   (use with dither)
                                                   -v   very high   95%    175     24-bit mastering

   where Band-width is the percentage of the audio frequency band that is preserved and Rej dB is the level of noise rejection.  Increasing levels of
   resampling quality come at the expense of increasing amounts of time to process the audio.  If no quality option is given, the quality level  used
   is 'high' (but see `Playing & Recording Audio' above regarding playback).

   The  'quick'  algorithm  uses  cubic  interpolation;  all others use band-limited interpolation.  By default, all algorithms have a 'linear' phase
   response; for 'medium', 'high' and 'very high', the phase response is configurable (see below).

   The rate effect is invoked automatically if SoX's -r option specifies a rate that is different to that of the input  file(s).   Alternatively,  if
   this effect is given explicitly, then SoX's -r option need not be given.  For example, the following two commands are equivalent:

      sox input.wav -r 48k output.wav bass -b 24
      sox input.wav        output.wav bass -b 24 rate 48k

   though the second command is more flexible as it allows rate options to be given, and allows the effects to be ordered arbitrarily.

Then the documentation goes more into what we are interested in:

Warning: technically detailed discussion follows.

The simple quality selection described above provides settings that satisfy the needs of the vast majority of resampling tasks. Occasionally,
however, it may be desirable to fine-tune the resampler’s filter response; this can be achieved using override options, as detailed in the follow-
ing table:

                                     -M/-I/-L     Phase response = minimum/intermediate/linear
                                     -s           Steep filter (band-width = 99%)
                                     -a           Allow aliasing/imaging above the pass-band
                                     -b 74-99.7   Any band-width %
                                     -p 0-100     Any phase response (0 = minimum, 25 = intermediate,
                                                  50 = linear, 100 = maximum)

N.B. Override options cannot be used with the ‘quick’ or ‘low’ quality algorithms.

All resamplers use filters that can sometimes create ‘echo’ (a.k.a. ‘ringing’) artefacts with transient signals such as those that occur with
‘finger snaps’ or other highly percussive sounds. Such artefacts are much more noticeable to the human ear if they occur before the transient
(‘pre-echo’) than if they occur after it (‘post-echo’). Note that frequency of any such artefacts is related to the smaller of the original and
new sampling rates but that if this is at least 44.1kHz, then the artefacts will lie outside the range of human hearing.

A phase response setting may be used to control the distribution of any transient echo between ‘pre’ and ‘post’: with minimum phase, there is no
pre-echo but the longest post-echo; with linear phase, pre and post echo are in equal amounts (in signal terms, but not audibility terms); the
intermediate phase setting attempts to find the best compromise by selecting a small length (and level) of pre-echo and a medium lengthed post-
echo.

Minimum, intermediate, or linear phase response is selected using the -M, -I, or -L option; a custom phase response can be created with the -p
option. Note that phase responses between ‘linear’ and ‘maximum’ (greater than 50) are rarely useful.

A resampler’s band-width setting determines how much of the frequency content of the original signal (w.r.t. the original sample rate when up-sam-
pling, or the new sample rate when down-sampling) is preserved during conversion. The term ‘pass-band’ is used to refer to all frequencies up to
the band-width point (e.g. for 44.1kHz sampling rate, and a resampling band-width of 95%, the pass-band represents frequencies from 0Hz (D.C.) to
circa 21kHz). Increasing the resampler’s band-width results in a slower conversion and can increase transient echo artefacts (and vice versa).

The -s ‘steep filter’ option changes resampling band-width from the default 95% (based on the 3dB point), to 99%. The -b option allows the band-
width to be set to any value in the range 74-99.7 %, but note that band-width values greater than 99% are not recommended for normal use as they
can cause excessive transient echo.

If the -a option is given, then aliasing/imaging above the pass-band is allowed. For example, with 44.1kHz sampling rate, and a resampling band-
width of 95%, this means that frequency content above 21kHz can be distorted; however, since this is above the pass-band (i.e. above the highest
frequency of interest/audibility), this may not be a problem. The benefits of allowing aliasing/imaging are reduced processing time, and reduced
(by almost half) transient echo artefacts. Note that if this option is given, then the minimum band-width allowable with -b increases to 85%.

Hopefully that helps… But it kind of depends on the type of music you are listening to and your ears preferences (e.g. you might lean more towards linear phase filter for classical). Also don’t forget:

Note that phase responses between ‘linear’ and 'maximum’ (greater than 50) are rarely useful.

2 Likes