Abstract
This
paper describes a new interface for interactive sound spatialisation by the
public over three-dimensional (3-D) loudspeaker arrays. The interface uses ultrasound
to implement an adaptation of F. R. Moore's General Model for Spatial Processing
of Sounds and offers clear ergonomic and computational advantages over existing
strategies. It is designed for use by non-trained individuals but could be equally
useful in a performance context especially where the performer is remote from
the projected sound field.
1 Introduction
The design of expressive public interactive music
installations obligates an approach centred on guiding the public through specific
sets of activities. An activity-based approach is necessary to elicit appropriate
music generating behaviour from participants. Physical restrictions on the actions
of participants are generally needed to keep behaviour within functional bounds.
In addition, where adequate restraint of behaviour is unfeasible, it may also
be necessary for designers to rationalise the sensitivity of sound generating
mechanisms.
First-time users of interactive installations may not fully recognise the relationship
between their own actions and the resulting sound. Hence, designers of public
interactive works must establish potent techniques to facilitate rapid cognition
in participants. It is important that participants are aware of the consequences
of their actions, as it is through informed behaviour that composers can best
communicate aesthetic material in an interactive context.
This paper will discuss a new method for interactive sound spatialisation by
the public. The system has been devised for The Talking Chair [Mott, 1995],
an interactive sound installation that enables non-trained individuals to manipulate
compositional algorithms and control the 3-D motion of a virtual sound object
around their body. We have developed a design with strong spatialised visual
reference points to aid the first-time user in interactive sound spatialisation.
While the interface is designed to control both music generating algorithms
and the spatialisation of sound objects, discussion of the compositional element
is beyond the scope of this paper.
2 Earlier Design
Our current work is performed as part of an ongoing
investigation into interactive music. In collaborations with designers and animators
we have produced works exploring spatial sound manipulation by the public. Squeezebox
was made after The Talking Chair in 1994 and was produced in part to examine
the use of restricted movement in public interaction. Unlike The Talking Chair
which allows unhindered spatial gestures within a given region, the Squeezebox
interface utilises four pneumatic pistons to insure smooth gestures from the
public.
The impetus for designing a new spatial controller came from a need to readdress
human factors associated with The Talking Chair. The first 1994 version of the
installation (Figure: 1) implemented John Chowning's [1977] theories of moving
sound simulation and used an ultra-sound wand device as an interface. A single
seated participant moved the wand through a cubic zone above the lap to control
the motion of sound. Ultra-sound pulses transmitted from the wand were picked
up by three fixed receivers and gestures were mapped in the form of cartesian
coordinates. A mirrored display cabinet was situated in front of the listener
as a visual aid to the spatial navigation of sound. The viewing cabinet, using
a magic box technique [Popular Mechanics Co., 1913], displayed the illuminated
wand tip as a glowing ball moving about a reflected image of the participants
head. The position of the ball relative to the head corresponded to the perceived
position of sound in the space surrounding the listener.

Regrettably,
while the magic box system worked effectively, its use required detailed written
explanation and some practice. We found that few people in museum environments
fully read the detailed diagrams and as a result, the cabinet often proved confusing
and distracting. The complexity of the interface also lead a few individuals
to believe that the cabinet itself was the object of the interaction! The magic
box display, in addition, requires a degree of control over ambient lighting
that is not alway practical in many public spaces. A simplification of approach
was in order.
3 Methodology
In attempting to remedy shortcomings in our original
human-machine interface our two main aims were to a) efficiently map the spatial
gestures of participants to factors simulating the spatial motion of sound and
b) provide an interface design that explicitly informs the participant of the
spatial relationship between their body and the virtual sound object.
We chose F. R. Moore's [1983] General Model for Spatial Processing of Sounds
as the basis for our interface design. The model is readily adaptable to a user
interface using ultrasound technology and has qualities which are highly instructive
in the interactive positioning of sound relative to the body.
The model was originally produced to address problems of listener perspective
in multi-speaker systems by generating audio delays and amplitudes with respect
to the loudspeakers rather than the listener. In this way listeners in different
locations within a sound field will each receive a different perspective of
the same spatial event. Loudspeakers in spatial arrays are modelled as windows
in a room, the size and shape of which is determined by the number and placement
of the speakers. This inner room is surrounded by a larger outer room in which
illusory sonic objects may move freely. The audience is contained within the
inner room and perceives virtual sound objects through the windows. Spatialisation
of sound is simulated by measuring the distance of the direct and reflected
sound paths from the virtual object to each window. The distances are used to
calculate individual amplitudes and time delays for each loudspeaker channel.
We were attracted to the General Model, not because of issues of listener perspective
but rather due to its potential to produce an interface design of great ergonomic
clarity. By constructing a user interface as a physical analogue of the inner
room we at once provide an efficient mechanism for real-time spatial control
and strong visual tool to communicate the position of sound with respect to
the listener.
4 Implementation Overview
The Sound Sphere
The interface design incorporates a small sphere
suspended from a stalk in front of the listener. The surface of the sound sphere
is impregnated with six ultrasound receivers each in positions directly corresponding
to the position of the speakers of The Talking Chair. An alignment tube is included
in the interface so participants can adjust the seat height to position their
head at the centre of the sound field. An ultrasound emitting wand is moved
around the sound sphere to position the sound relative to the speakers of the
sculpture. Participants will be informed in a simple four stage instruction
chart (Figure 2) how to interact with the sculpture. Instruction covers a) the
use of the alignment device b) how to change sounds and c) how to position sound
objects relative to the sculpture and consequently the body of the participant.
The discovery of the remaining nuances of interaction is left to the investigation
of participants.
Amplitudes and
Delays
The interface uses ultrasound to measure the distance
from the wand to each receiver. Distance measurements are used to determine
both amplitude and delay time of sound for each loudspeaker channel. Audio amplitudes
are to be attenuated with a linear rather than an inverse cubic relationship
[Moore, 1989] to distance, as such methods are more useful for interactive control
[Ballan et. al., 1994].
With our implementation of the General Model, only the direct (not reflected)
paths from the wand to each receiver are measured. While the omission of reflected
paths represents a simplification of the model, a further complexity results
from the distortion of the sound field caused by the sound sphere.
In the General Model, windows with direct sound paths obstructed by walls receive
signal levels of zero. In cases where a window is suddenly shadowed by an obstructive
wall, signal levels decline sharply and it is necessary to interpolate between
amplitude values in order to avoid audible clicks [Moore, 1989]. With our interface
design part of the ultrasound signal is diffracted around the surface of the
sound sphere. Smooth amplitude transitions occur as windows (receivers) become
gradually shadowed. Distance measurements from the wand to shadowed receivers
already include the extra path lengths resulting from the sound wave following
the curve of the surface, and are thus a truer representation in this context.
While it is possible for us to achieve stable amplitude measurements at each
receiver, we will not however map these readings to loud speaker amplitudes.
Our transmitter is not fully omnidirectional and consequently models a sound
object with a directional radiation pattern. In addition to the inverse square
nature of the response, such directionality would render the interaction unsuitable
to non-trained usage. A mechanism of this type is however worthy of continued
investigation in a performance context.
Hardware
Reverberation is to be implemented using off-the-shelf
effects units and the level will be determined according to the distance of
the wand from the sound sphere. As in the 1994 version of The Talking Chair
we will implement John Chowning's local and global reverberation techniques
[1971] which we have found to be highly efficient and effective in simulating
changes in distance.
The attenuation of individual audio channels is to be performed by a custom
built MIDI controlled mixer. The device which uses analog circuitry, is capable
of controlling four independent sound objects in a six loudspeaker array. The
device has the capacity to control the signal level of audio sent to each loudspeaker
as well as controlling the level of both local and global reverberation. The
mixer also contains insert points for the six individually tapped delays of
one virtual object.
At time of writing, the audio delay software required for the spatialisation
is being implemented on a DSP56002 processor [Motorola Inc., 1993]. Whilst the
processor is an overkill for the simple delay processing required (basically,
memory lookup and interpolation), it was decided to pursue this method as it
is planned, eventually, to shift all of the spatial processing required for
The Talking Chair and other projects onto the DSP, and eliminate most of the
analog circuitry currently being used.
Ultrasound distance measurements are performed by a PIC16C84 microcontroller
[Edwards, 1994] [Microchip Technology Inc., 1994]. The unit produces its output
as MIDI messages which will be received by a Macintosh computer running the
FORMULA music language. In addition to controlling music generating hardware,
the Macintosh will control the DSP56002 and the custom mixer via MIDI.
5 Ultrasound Specification
Choice of Frequency
Ultrasound transducers, of the type readily available
to us, are highly directional devices. In order for them to provide useful distance
information for interactive spatialisation it is necessary to enhance their
omni-directional capacity. This is to ensure that for all angles, the signal
strength of a direct path to a given receiver is of a higher intensity than
that of reflected signals. The 1994 ultrasound implementation used 40KHz transducers.
The wand housed the transmitting transducer (Tx) and was moved through a zone
monitored by three receivers. We found that by fixing a 16mm (diameter) marble
in front of the Tx, sufficient reflection occurred to allow distance (ie timing)
measurements to work, no matter which direction the wand was pointed in.
The original wand would be impractical in the new implementation as it is necessary
for the sonar wave to be diffracted, at least partially, around a sound sphere
of perhaps 80mm diameter. The new wand uses 25KHz, which has a wavelength of
around 13.5mm, and although this is still much smaller than 80mm, our tests
showed that there was just enough diffraction around the sound sphere for useable
distance (ie timing) measurements at all but the most distant receivers. A 40KHz
signal is far less useable as its shorter wavelength (8.5mm) will result in
far greater reflection.
This lower operating frequency also allows the use of small (6mm diameter) electret
mics as the sonar receivers. Their small size results in them having a near-omnidirectional
response pattern in free air (mic diaphragm diameter is smaller than wavelength),
and thus a hemispherical response when they are mounted flush on the sound sphere
surface. 25KHz is somewhat above the intended operating frequency of the electret
mics, which are designed for audio applications, but with sufficient amplification
and appropriate bandpass filtering they are quite useable.
Modifications to 25KHz Transducers
Attempts to improve the omni-directional response
of 25KHz transducers using a marble to scatter the beam have been unsuccessful.
This is possibly due partly to the design differences between the two transducers
as well as differences in wavelength. The 25 KHz Tx radiates directly from the
piezoelectric surface via a grille, the centre 9 mm of which is blocked off,
allowing sound only through the surrounding annulus. An enhanced omni-directional
response has been achieved through the use of a conical device (Figure 3) attached
to the front of the Tx. In our design, the inner
concentric cone directs the annular cross-section wave to the tip of an outer
cone, where the final release to the surrounding air is via a 4mm diameter hole.
This hole is much smaller than the 13.5 mm wavelength, so the radiation is nearly
omnidirectional. With the ultrasound cone attached, the radiation in any direction
is uniform within 6 dB, except for a shallow null of 9 dB at 165 degrees. This
compares favourably with the radiation pattern measured originally on the bare
transducer, which showed a variation of 26 dB (Figure 4).
Timing Measurements
The current implementation of the hardware uses a
PIC16C84 microcontroller to generate the blips and process the received signals.
It incorporates a manual adjustment of blip rate (30 to 60 mSec between blips)
to allow use in different reverberant spaces. It also measures all 6 received
signals near-simultaneously, so that a complete set of measurements is made
at each blip, rather than having to cycle through 6 blips (and wait 6 times
as long before a system response to a wand movement can be perceived).
The MIDI data byte output of the PIC is currently set as 7-bits and the resolution
of distance measurements is 6mm. Data filtering techniques are to be applied
in the Macintosh computer to smooth and interpolate between values.
The Tx is pulsed with bursts of 8 cycles of the 25KHz signal (each burst lasts
for 0.32 mSec). This signal is generated in software, and drives the Tx from
two PIC data lines via a differential summing amp. The final waveform is symmetrical,
swinging ±20V, with zero average DC. Lower frequency clicks from the Tx,
which had earlier been a problem, are thus almost inaudible, despite the higher
power used to compensate for losses in the cone structure.
6 Summary
The new Talking Chair user interface has significant
advantages over the original magic box device. The most notable advancement
is the shift from a 2D frame of reference to a true 3D method. Freedom of movement
is more limited with the new interface due to partial obstruction by the attachment
stalk and the need to navigate the wand in close proximity to a solid surface.
We believe however that the strong ability of the device to communicate notions
of spatial location, far outweighs obstruction concerns in a context of public
interaction.
References
[Ballan et. al., 1994] Oscar Ballan, Luca Mozzoni
and Davide Rocchesso. Sound Spatialisation in Real Time by First-Reflection
Simulation. Proceedings of the 1994 International Computer Music Conference,
pp.475-476, 1994.
[Chowning, 1971] John Chowning. The Simulation of Moving Sound Sources, Journal
of the Audio Engineering Society 19(1): pp.1-6, 1971.
[Edwards, 1994] Scott Edwards. The PIC Source Book, Scott Edwards, 1994.
[Microchip Technology Inc., 1994] PIC16C84 Data Book, DS30081C (Preliminary),
Microchip Technology Inc., 1994.
[Moore, 1983] F. Richard Moore. A General Model for Spatial Processing of Sounds,
Computer Music Journal 7(3): pp.6-15, 1983.
[Moore, 1989] F. Richard Moore. Spatialization of Sounds over Loudspeakers.
In Max V. Mathews and John R. Pierce (Eds.): Current Directions in Computer
Music Research, MIT Press, Cambridge Massachusetts, pp.89-103, 1989.
[Motorola Inc., 1993] DSP56002 Digital Signal Processor User's Manual, Motorola
Inc. 1993.
[Mott, 1995] Iain Mott. The Talking Chair: Notes on a Sound Sculpture, Leonardo
28(1): pp.69-70, 1995.
[Popular Mechanics Co., 1913] An Electric Illusion Box in The Boy Mechanic,
Volume I, Popular Mechanics Co., Chicago, pp.130-131, 1913.