Explanation For SoX Filter Controls

Hello all,

Could someone help me understand in non-technical layman’s terms what the upsampling controls mean, what they’re supposed to do and what effect they should have on the sound quality?

I’m not looking for suggested settings because afterall only I can really determine that. But to know what the controls do would be a great help, thank you.

P.S., I’m using Audirvana 3.5 on a Macbook.

2 Likes

This is a great question, especially in regard to the SoX filter settings.

3 Likes

Damien once told me the software comes pre-configured with the recommended best settings. That being said, it would be nice to know some of the cause and effect of changing these settings given the understandable uniqueness of each of our systems and their strengths and weaknesses and how these might be better adjusted within these program settings.

1 Like

Yes, it would be good to know and understand without having an audio engineering degree.

Even with a Ph.D. in electrical engineering and particular expertise in physical acoustics, this isn’t an easy question. SoX is just an open source tool for digital audio signal processing. Unfortunately, I suspect that even most people who use it don’t know exactly what it’s doing under the hood. For example, upsampling can be done in a number of ways, including linear quadratic or parabolic interpolation. There are also more sophisticated methods of upsampling that might be used. I have no idea which method or methods SoX uses. Upsampling increases the sample rate of the data stream. For example, 2X upsampling of a 96kHZ bit-rate signal turns it into a 192kHz bit-rate signal, but it’s only an estimate, as it’s impossible to recreate data points that aren’t there. The advantage of upsampling is that it increases the Nyquist frequency. The drawback is that it introduces a degree of quantization error, which can result in distortion.

The Nyquist frequency is the maximum hypothetical frequency of the audio signal that can be encoded at a given sample rate. Per the sampling theorem, the Nyquist frequency is exactly half the sample rate, and with a perfect filter, one could in theory recover signals that are 100% of the Nyquist frequency. Digital filters can be designed to mimic an ideal frequency response curve, which is flat right up to the Nyquist frequency, and zero above it. The only limitation on that is the length of the filter, but longer filters result in a lag between the input and output signals, and require more processing power.

Aliasing refers to the phenomenon in which sampled signals with frequency components greater than the nyquist frequency may sneak through as the difference between the actual frequency and the Nyquist frequency. This is the same thing as a beat frequency in AM radio. Ideally, aliasing can be prevented by filtering out such frequencies before the audio is sampled. Antialiasing is a strategy to detect and filter out aliased signal components, but there is the potential to reject some of the signal too. Phase refers to the variation in phase across the bandwidth of the filter. Ideally, the phase should be completely linear.

So in laypersons terms, a 95% bandwidth, even for a signal sampled at only 44.1kHz, will have a bandwidth well above the range of human hearing. If you happen to have superhuman hearing, you can increase the bandwidth up to 100% of the Nyquist frequency, but at the expense of needing a longer filter length and additional processing power.

1 Like

My thinking is that % Nyquist should be max unless you are trying to make an apodizing filter. If you don’t know what this is, set bandwidth to max. :slightly_smiling_face:

Filter max length: Longer filters allow your settings to operate longer on the input (more “taps” as some people put it). This helps eliminate aliasing. On the other hand, a filter can “ring” (a misnomer, it’s actually Gibbs effect) only for as long as it is applied, so a very short filter rings less. In practice I have found I care very little about this setting. I have it on max at the moment.

Anti-Aliasing: How much the filter “cuts.” More cut = less ultrasonic aliasing and imaging, which is good, but more ultrasonic “ringing” (Gibbs effect), not considered good. Many people, including me formerly, have this set rather low. It can give a more “live,” exciting sound. But these days I don’t think it’s correct to set it low and allow aliasing, so I have it set to max.

Filter phase: Lower (toward minimum) phase pushes the Gibbs effect “ringing” to the end of impulses, which some folks feel makes the initial attack sound sharper. Higher (toward linear) phase allows “ringing” to occur near the beginning of the impulse. However, I have linear phase speakers (Vandersteen), so I use linear phase because I think it sounds better with them. Experiment with this setting and see what you think.

2 Likes

Thanks Steve and Jud. I’m experimenting with the settings and already leaning towards your recommendations which happen to be the default settings! I’m finding the phase setting seems to be the most noticeable to me where the higher I go the more drier and less live the sound. This is helpful depending on my preference for the recording I’m listening to at the time. Of course this is as it applies to my room and system. Thanks again.

TLDR:

  • 95.0% filter bandwidth (the default might be a little to high according to technical detail below)
  • 30,000 filter length
  • anti-aliasing 100.0%
  • filter phase: 0 (minimum phase to prevent pre-ringing)

Super happy up-sampling to 1411.2 KHz with a Denafrips XD.

From the man page for sox. First the “presets” just to set some baselines:

rate [-q|-l|-m|-h|-v] [override-options] RATE[k]
Change the audio sampling rate (i.e. resample the audio) to any given RATE (even non-integer if this is supported by the output file format) using
a quality level defined as follows:

                                                         Quality   Band-  Rej dB   Typical Use
                                                                   width
                                                   -q     quick     n/a   ~=30 @   playback on
                                                                           Fs/4    ancient hardware
                                                   -l      low      80%    100     playback on old
                                                                                   hardware
                                                   -m    medium     95%    100     audio playback
                                                   -h     high      95%    125     16-bit mastering
                                                                                   (use with dither)
                                                   -v   very high   95%    175     24-bit mastering

   where Band-width is the percentage of the audio frequency band that is preserved and Rej dB is the level of noise rejection.  Increasing levels of
   resampling quality come at the expense of increasing amounts of time to process the audio.  If no quality option is given, the quality level  used
   is 'high' (but see `Playing & Recording Audio' above regarding playback).

   The  'quick'  algorithm  uses  cubic  interpolation;  all others use band-limited interpolation.  By default, all algorithms have a 'linear' phase
   response; for 'medium', 'high' and 'very high', the phase response is configurable (see below).

   The rate effect is invoked automatically if SoX's -r option specifies a rate that is different to that of the input  file(s).   Alternatively,  if
   this effect is given explicitly, then SoX's -r option need not be given.  For example, the following two commands are equivalent:

      sox input.wav -r 48k output.wav bass -b 24
      sox input.wav        output.wav bass -b 24 rate 48k

   though the second command is more flexible as it allows rate options to be given, and allows the effects to be ordered arbitrarily.

Then the documentation goes more into what we are interested in:

Warning: technically detailed discussion follows.

The simple quality selection described above provides settings that satisfy the needs of the vast majority of resampling tasks. Occasionally,
however, it may be desirable to fine-tune the resampler’s filter response; this can be achieved using override options, as detailed in the follow-
ing table:

                                     -M/-I/-L     Phase response = minimum/intermediate/linear
                                     -s           Steep filter (band-width = 99%)
                                     -a           Allow aliasing/imaging above the pass-band
                                     -b 74-99.7   Any band-width %
                                     -p 0-100     Any phase response (0 = minimum, 25 = intermediate,
                                                  50 = linear, 100 = maximum)

N.B. Override options cannot be used with the ‘quick’ or ‘low’ quality algorithms.

All resamplers use filters that can sometimes create ‘echo’ (a.k.a. ‘ringing’) artefacts with transient signals such as those that occur with
‘finger snaps’ or other highly percussive sounds. Such artefacts are much more noticeable to the human ear if they occur before the transient
(‘pre-echo’) than if they occur after it (‘post-echo’). Note that frequency of any such artefacts is related to the smaller of the original and
new sampling rates but that if this is at least 44.1kHz, then the artefacts will lie outside the range of human hearing.

A phase response setting may be used to control the distribution of any transient echo between ‘pre’ and ‘post’: with minimum phase, there is no
pre-echo but the longest post-echo; with linear phase, pre and post echo are in equal amounts (in signal terms, but not audibility terms); the
intermediate phase setting attempts to find the best compromise by selecting a small length (and level) of pre-echo and a medium lengthed post-
echo.

Minimum, intermediate, or linear phase response is selected using the -M, -I, or -L option; a custom phase response can be created with the -p
option. Note that phase responses between ‘linear’ and ‘maximum’ (greater than 50) are rarely useful.

A resampler’s band-width setting determines how much of the frequency content of the original signal (w.r.t. the original sample rate when up-sam-
pling, or the new sample rate when down-sampling) is preserved during conversion. The term ‘pass-band’ is used to refer to all frequencies up to
the band-width point (e.g. for 44.1kHz sampling rate, and a resampling band-width of 95%, the pass-band represents frequencies from 0Hz (D.C.) to
circa 21kHz). Increasing the resampler’s band-width results in a slower conversion and can increase transient echo artefacts (and vice versa).

The -s ‘steep filter’ option changes resampling band-width from the default 95% (based on the 3dB point), to 99%. The -b option allows the band-
width to be set to any value in the range 74-99.7 %, but note that band-width values greater than 99% are not recommended for normal use as they
can cause excessive transient echo.

If the -a option is given, then aliasing/imaging above the pass-band is allowed. For example, with 44.1kHz sampling rate, and a resampling band-
width of 95%, this means that frequency content above 21kHz can be distorted; however, since this is above the pass-band (i.e. above the highest
frequency of interest/audibility), this may not be a problem. The benefits of allowing aliasing/imaging are reduced processing time, and reduced
(by almost half) transient echo artefacts. Note that if this option is given, then the minimum band-width allowable with -b increases to 85%.

Hopefully that helps… But it kind of depends on the type of music you are listening to and your ears preferences (e.g. you might lean more towards linear phase filter for classical). Also don’t forget:

Note that phase responses between ‘linear’ and 'maximum’ (greater than 50) are rarely useful.

2 Likes

PS - I have been experimenting with lowering the bandwidth below 95%… I ran an informal hearing test yesterday and I can only hear up to about 16k or 17k… Using a bandwidth of something like 82% has really cleaned up the sound in surprising ways… :blush: i.e. I am resolving details I have not heard before in music I know well.

Thanks for your input. Were your trials at the native sampling rate of the music file or upsampled?

My experience based on Ares II, THX 887, Audeze LCD2:

  • Good: NOS - fun and energetic but feels a little uncontrolled in my cans.
  • Better: OS - there is a reason this is the default… Denafrips implementation is very good (I would suggest slow filter for the same reason you don’t want to use the steep filter in sox).
  • Best: SOX - as good as the built-in OS is… and it is very good… but it is not as good as a beefy CPU with 30000 taps… this is my poor man’s Chord. :stuck_out_tongue:
  • Best DSD - so the amazing thing about the Denafrips is how many DACs you get in one box. This is my poor man’s PS Audio DirectStream. Use your same SoX settings but upsample to DSD. I am anxious to try DSD 1024 :stuck_out_tongue:

Depending on the song I go back and forth between PCM and DSD… but this is a tweakers dream - so many choices! I am mostly listening to SoX upsampled 1411.2 or 1536KHz but if I get bored, I just switch to a different “DAC” and use the NOS, build-tin OS, or switch to DSD just to get a different experience. They all allow you to appreciate the song in a slightly different way.

PS - when I am upsampling on the PC, I put the DAC into NOS mode since the Denafrips wants to oversample everything to max sample rate.

1 Like

I have a Denafrips Ares ll as well. My set up is MacBook Pro with Audirvana connected to Ares via USB then to Denon PMA-777 and JBL4312 speakers. I agree about the Ares. Plenty to experiment with!

Okay with the MBP I can recommend the iFi iPurifier3 + good USB cable. It does not cost a ton and you can prove out whether you can hear a difference or not. There are other options but they tend to go up in price pretty quickly. I like the convenience of the iFi iPurifier3 and I can hear a difference - especially on DACs that do not handle jitter well. Laptops produce a lot of RFI so you should be able to hear some noise reduction from a device like this straight away.

I have a couple of Pangea Audio - Premier SE USB cables and those also produced a nice little bump in sound quality. :slight_smile:

2 Likes

What a great bunch of posts; thanks for the ideas. I made small changes to the Upsampling:
SoX filter bandwidth (% Nyquist) from 95 to 88
SoX Filter Phase from Linear to Min. Phase
I have a reasonably good set-up and these changes have improved my experience. The bass seems more enhanced and detailed, and the slight sibilance I was hearing has disappeared.
Merci à tous

2 Likes

Nice settings. I like them

Since we have great recommendations in this thread, what are your thoughts on DSD up sampling in conjunction with these settings? I’ve tried A 7th order and B 7th order, I seem to like A better.

Ho impostato l’upsampling alla massima frequenza del dac, immutate le altre impostazioni sox e trovo un netto miglioramento del suono rispetto a quando avevo in off. Ma ho notato che a volte in qualche brano la riproduzione si interrompe verso la fine del brano, come mai? Devo cambiare qualche impostazione? Ho 4 gb di ram nel mio pc, insufficienti?

This issue with stopping before the end of the song is a known issue with some DACs. It’s DAC dependent, not memory dependent. The only workaround at the moment is to disable the upsampling.

How much memory you need depends on what is the max sampling frequency of your DAC. More is for sure better, 4GB is on the low side. Still, if you don’t have glitches or stuttering it means it’s enough.

Grazie mille della risposta, io uso il dac Beresford Caiman Seg e con l’upsampling il suono mi sembra nettamente migliorato, il problema che ho evidenziato è minimo e di solito capita negli ultimi secondi dell’ultimo brano di un album ed è molto strano. Penso che mi terrò l’upsampling inserito.

Hi!

In witch way? More natural sound? More clarity on bass etc?

Cheers,