Stardust Film - Peter V. Meiselmann Production Sound & Design

48 kHz or 96 kHz Sampling Frequency ?
That’s the Question (Part II)

by Jean-Claude Schlup

Reminder
When we last spoke, we tried to determine whether an acoustic system working at 96 kHz was better, or simply different from one operating at 48 kHz.The subjective experience showed definitively that a difference existed, but we were unable to conclude that the sound on the speakers of the first system was better qualitatively than on those of the second.

An experience studied in the frequency domain showed that above 20 kHz the loudspeaker, or more precisely the tweeter, creates intermodulation.The speaker is therefore no longer a realistic transducer but more an interpreter of the sound; the phenomena of intermodulation modifies the contents of the audio message and therefore diminishes the fidelity of the system.

In the temporal domain
We could have left our demonstration there, but we were in the mood to examine some other experiments. In the same way as the eye has difficulties in discerning two points placed close to one another, in the order of a thousandth of a radian, the ear has a discriminating limit with respect to time.The idea for this new experience came to us when recalling a visit to an ear specialist. This university hospital practitioner was looking to digitize a clinical procedure usually performed in the analog domain. This test consisted of making a patient listen to two brief impulses separated in time, by a duration voluntarily adjusted by the operator.

According to this doctor, digital systems currently in use (in the middle of the ‘80s) were unable to go beyond 48 kHz sampling frequency, whence it seemed that the separation ability of our ears could reach 20 microseconds. At the time A/D and D/A converters only operated up to speeds of 48 kHz and without oversampling, we were unable to satisfy the practitioner.Remembering this, it seems evident that the direction of the temporal domain merits a little more exploration, and the answer could be the justification for recording at 96 kHz rather than 48 kHz.

First experience
The operation was made simple. An ad-hoc signal generator was used to generate two brief pulses and to separate them by a variable duration. In our case, the duration of the impulse was 15 microseconds, and the separation delay could be adjusted from coincidence up to 20 milliseconds.
The repetition speed is set to about half a second, which experimentally speaking seemed optimal in order to evaluate the time discrimination of the ear.
The first operation to come to mind is simple: it is sufficient to amplify this double impulsion by means of an amplifier sufficiently rapid to attack a tweeter normally used in quality speakers.

The listening was done by placing the auditor at a distance of about 1.5m from the tweeter, with one ear facing the source. The distance of 1.5m is a compromise: closer, we would find ourselves in a situation of near-field (pressure/speed balance not established), further, and bearing in mind the dimensions of our listening room, we could be disturbed by reflections from the walls.
We can thus describe the results of this manipulation: 20 milliseconds to 5 milliseconds, the ear can disassociate the two impulsions perfectly. Below 5 milliseconds there is confusion; the signal appears to be a single impulse. Around 20 microseconds, (measured between the two rising edges of the impulses A and B) (Fig. 3) an unexpected phenomena appeared: the sound became muffled. This phenomena becomes more apparent if we increase the level of excitement of the tweeter.

At this point in the tests, it seems that we could show the advantages which we could have gained from a system capable of sound restitution with a temporal precision better than 20 microseconds, which is the limit for a system operating at sampling frequency of 48 kHz.

Second experiment
In order to be assured that this phenomena was not a side effect of our amplifier, we modified the experiment by injecting the impulses A and B to independent amplifiers and the outputs of the amplifiers both attack the same tweeter. The results of the tests with this new set-up led us to the same conclusion: to be able to discriminate two pulses up to 5 milliseconds and modification of the nature of the sound once we drop below the 20 microseconds barrier.

Third experiment
The conclusion could have seemed final, but while re-reading our notes on the measurements made in the harmonic domain, it occurred to us that, perhaps, the speaker could have provoked this phenomena. A new manipulation was thought up, which consisted in only using one pulse, amplified by a single amplifier, attacking two speakers side by side.

It was therefore possible to create two delayed pulses by simply moving the position of one of the speakers, 7.5mm corresponding to 20 microseconds delay.

Below 50mm of the relative position of the tweeters, the difference in the level of the two pulses heard at a distance of 1.5 m should not trouble the experiment.

As soon as it was set up, it was tested and our suppositions were confirmed: the tweeter was the source of the phenomena. In this last experiment the progressive modification of the relative positions of the tweeters from 50mm to 0mm made no difference to the sound of the pulses. It is once again in the non-linearity of the materials used to build the tweeter where the explanation of phenomena observed in the first and second experiments lies.

Placing a measurement (B+K) microphone in the place of the auditor, we can clearly see the phenomena which appears when approaching the coincidence of the pulses in the first two experiments (Figure 6A, delay is 120 microsec. approx.; Figure 6B, delay is 50 microsec. approx.). However, we do not see it in the third.

Thanks to the non-linearity of the tweeter, figures 6 A/B show that during T2 appears low frequencies not present in the pulses. It is this trail which makes the "clap" sound muffled.

By integrating and performing an FFT (Fast Fourrier Transformation) on the signal, which represents the impulse response (we should say double impulse response) we should obtain a correlation with the observations noticed in the frequency domain as explained in the previous article (NAGRA NEWS #19)

Conclusion
Remember that the prosaic aim of these experiments was to demonstrate that it is preferable, to obtain an optimal acoustic quality, to record our works at 96 kHz sampling frequency rather than 48 kHz. However, the experiences described above, show that the only thing we managed to do is to bring to light the faults in the weakest elements in the listening chain, in this case the loudspeaker.

It seems therefore, that we should wait for an improvement in the transducers in order to benefit fully from the performances of a recording system operating at 96 kHz. As an addition to this conclusion I invite you to re-read the article «96 kHz Recording, a door to the future» which we published in NAGRA NEWS #14 (Dec.1996).

e-mail: pvm@stardustfilm.com

   7510 SUNSET BLVD.      PMB 240
   HOLLYWOOD      CALIFORNIA 90046       USA
   TELEPHONE: (310) 288-7889      FAX: (818) 763-5886
   www.stardustfilm.com