We are setting up an archive for the Elektor Vocoder. Together with Dirk VandenBerghe – a fanatic Formant Synthesizer builder from Belgium we have collected most of the original articles. At this moment our collection consists of Elektor articles in Dutch, English, French and German language.
We will extend this collection with all documents on this Vocoder. In this part 2 we’ve collected the original articles in Dutch but more parts will follow soon. Also see part 1 with the basic text on the Elektor Vocoder.
If you have any additional pictures, information, experiences etc. on the Elektor Vocoder please share it with us to share it with the worl
This post contains the several text articles published in Elektor Magazine about the Elektor Vocoder. We are also building a collection of the original articles including the building instructions. Please note that all (c) are with Elektor magazine and its authors.
VOCODERS (1) Elektor april 1978
An orchestra sudderliy begins to recite a passage of Shakespeare, an electric guitar reads the news, the
voice of a talker unexpectedly changes sex, a single voice sounds like a chorus – these are iust a few
of the amazing effects which can be obtained with a new electronic instrument – the vocoder.
This article explains the ins and outs of this fascinating new development in the field of electronic ‘music’.
A vocoder (VOice CODER) is an instrument designed to analyse and electronically recreate the sound of the human voice. Although vocoders are in fact a far from recent invention, and have been used for a number of years in such fields as telecommunications and data processing, it is only within the last couple of years that a serious attempt has been made to exploit their enormous potential for musical and sound effect applications.
The term `vocoder’ was first coined in 1936 by an American called Homer Dudley, who invented a machine to compress the bandwidth of speech for transmission purposes. There was also a certain amount of interest in vocoders in Germany during the thirties. This interest was stimulated by the realisation that they had an obvious military potential – the encoding of secret messages.
By the middle of the sixties Siemens possessed a vocoder which was occasionally used for recordings. Similarly the BBC Radiophonic Workshop, and a number of other experimental studios used vocoders for special effects on records, radio and television. However all these early prototypes suffered from the drawback of being extremely large and unwieldy, and as such were quite unsuited for other than specialised applications.
The real breakthrough came in 1975 with the appearance of a vocoder which, by virtue of its compact and ergonomical design, was suitable for use in a conventional studio situation where it could be interfaced with other equipment, thus allowing its full potential to be realised. This was the EMS (Electronic Music
Studios) Vocoder (see photo 4) developed by Tim Orr, a self-contained portable instrument that can not only synthesise speech at constant and varying pitch, but by using a second non-speech input signal can encode literally any recorded sound with any speech sound.
The machine can thus produce the effect of ‘talking’ musical instruments. Since the EMS Vocoder, Sennheiser have capitalised upon their experience of using vocoders in the field of communications, and with the assistance of Heinz Funk of the Hamburrg Radio Studio have brought out the Sennheiser Sound Effect Vocoder VSM 201 (see photo 1). The latest development is a smaller version of the EMS Vocoder,
called the EMS2000 (see photo 6), which, by virtue of its size and extreme portability, is particularly suited for live work.
Speech-synthesis and Vocoding
As mentioned above, a fundamental feature of vocoders is their ability to analyse and electronically simulate the sound of speech. Thus before going on to examine the operating principles of a vocoder it is first necessary to take a look at the basic characteristics of human speech.
At the moment it is virtually impossible to create a realistic replica of the human voice, since not only do speech sounds have a very irregular intensity, but they are also extremely rich in harmonics. Synthesised speech is always too clean’, too free from natural imperfections. Speech itself is composed of two main component sounds: a. Air from the lungs can be forced between the vocal chords situated in the windpipe, causing these chords to vibrate and a pulsating air-column to enter the mouth and nasal cavities. The fundamental frequency of the resultant note is determined by the length, thickness and tension of the vocal chords. Sounds produced in this fashion e.g. the vowels, are known as VOICED sounds. b. Alternatively, if the air from the lungs is not forced through the vocal chords, but simply expelled through the mouth, then so ralled UNVOICED sounds are produced, such asf’ or h’. These are basically similar to the type of sounds which can be produced by a noise generator. In the case of both voiced and unvoiced sounds the shape of the mouth and nasal cavities determines the character or timbre of the sounds. Variation of cavity RESONANCES by movement of the tongue and lips controls the harmonic content of the voice and enables us to form separate vowvels and consonants (see figures 2a and 2b). The lips play a particularly important role in sounds which are distinguished by their dynamic amplitude characteristics, such as the percussive attack transient of fhe ‘p’ inpaper’.
Thus the voice can be seen as a complex sound generating instrument, consisting of a frequency and amplitude-controlled oscillator (the vocal chords and lungs), a noise generator (the lungs) and a set of tone filters (the mouth and nasal cavities).
Viewing the voice in this way naturally leads one to speculate whether it might be possible to synthesise speech, using techniques similar to those employed in a music synthesiser. The vocal chords could be replaced by an oscillator, the output waveform of which is sufficiently rich in higher harmonics to allow differentiated filtering, whilst a noise generator could be used to provide the unvoiced sounds. A switching circuit would cut back and forth between the above two sound sources depending upon which mode of voice was required.
However problems begin to arise when one considers the type of filters that would be needed for a spcech synthesiser of this type. Since the continual variation of both the static harmonic content and dynamic characteristics of the sound is crucial for the formulation of articulate speech, an equaliser-type filter would be necessary to simulate all the nuances in the tonal character of human speech. At this point it becomes clear that an analogue speech-synthesiser of this kind would require an enormous amount of hardware, for how does one generate the extremely complex pattern of voltages needed to control the filter
One possibility to simplify the process is a hybrid system, using a memory to store the control voltages. The quality of modern speech-syntlhesisers which use such a system is fairly good. Doubtless many readers will have seen or heard of so-called
talking' computers, which use synthetically-generated speech to express the results of their calculations, and thetalking’ calculator shown in photo 1 proves that it does not require an enormous amount of hardware to synthesise speech digitally. Photo 2 shows that the digital speech- synthesiser consist; of just two Ics mounted on a single board. The speech components are stored digitally in a ROM, where they can be scanned by a speech synthesiser micro-controller. A D/A converter in the micro-controller then generates the analugue speech components, from their digital equivalents.
Although storing the speech components digitally represents ˛y far and away the simplest solution for systems designed to generate speech (assuming the desired vocabulary is not too large), this is not the case with vocoders, and here we come to the basic difference between vocoders and speech-synthesisers.
A vocoder is basically designed to superimpose the pattern of spoken words onto a recorded non-speech signal (such as, music, the sound of wind, surf, etc.) so that the resultant effect is that of a talking orchestra, for instance. The articulation of the output signal is extremely good, being distinguished by remarkable clarity and distinctiveness. This quality of articulation, among other things, is what distinguishes the
vocoder from other less sophisticated special effect devices such as the wellknown WAWA pedal, or the more recent MOUTH BAG or MOUT TUBE (see photo 3).
The latter is basically a crude acoustic-mechanical vocoder. The signal from an electric guitar or similar source is fed to a powerful amplifier, which drives a loudspeaker situated in a closed box. The amplified sound from the guitar is then fed via a plastic tube to the mouth of the musician. Without using his vocal chords, but simply altering the shape of his mouth cavity he can then articulate the guitar signal, so that the guitar appears to be talking. This signal is picked up by a microphone in front of the musician’s mouth and fed through the PA system in the usual fashion. The sounds produced by the mouth tube are essentially similar to those produced by a vocoder.
However, not only is the mouth tube fairly limited in the number of possible applications, but, compared with vocoders, the quality of articulation is considerably inferior. In particular, it is extremely difficult to produce unvoiced and explosive sounds.
By now the reader should have gained a good idea of the basic principles of vocoding: the vocoder modulates the articulation of speech upon a second excitation’ signal. This is done by converting the input speech signal into data which can be used to vary the output signal. Although in principle there are various different ways of analysing and synthesising speech, the three vocoders described above ar all ‘channel vocoders’. Figure 3 shows the functional block diagram of this type of vocoder. The speech signal (from the microphone) is fed to a bank of bandpass filters, which split the signal into a number of separate and very narrow frequency bands. Rectifying and feeding these signals through lowpass filters, a series of DC voltages which match the envelope of the filter output signals can be obtained. These are in fact the control voltages which will control the synthesiser filter bank, and represent a real time spectrum analysis of the speech signal. The input speech signal is also fed to a second circuit, the voiced/unvoiced detector. This continuously sam˛les the speech signal to decide whether it is a voiced or unvoiced sound, and indicates the result by switching to one of two voltage levels (e.g. 0 V and +5 V). The outputs of the voiced/unvoiced detector and the envelope followers control the synthesiser scction of the vocoder. This contains the same number of filters as the analyser section, so that the excitation signal (be it simply the synthesiser oscillators and noise generator, or these two sound sources plus an extemal input) is analysed into the same number of separate frequency bands as the speech signal. Via a series of voltage controlled amplifiers, the outputs of the filter sections are then varied by the control voltages derived from the envelope followers, with the result that the spectrum of the speech signal is imposed upon the excitation signal. The separate channels are summed and fed to the output stage. The resultant signal possesses thevoice’ of the excitation signal (e.g. a violin), but has the articulation of the passage of speech. Furthermore, both the typical character of the excitation signal as well as all the nuances of articulation in the speech signal (dialect, emphasis etc.) are completely preserved. That is to say, the human voice is simply replaced by that
of whatever instrument is used for the excitation signal.
In theory, therefore the voiced/unvoiced detector should be superfluous, however most excitation signals do not have a sufficiently wide dynamic spectrum to synthesise the sound of sibilants (s’,h’, etc.). For this reason the voiced/unvoiced detector ensures that the noise generator provides the synthesiser section with the appropriate `raw material’ whenever the excitation signal cannot do so.
Photos 7a and 7b show examples of typical signals which appear at the test points numbered in figure 3. The progression of signals in photo 7a illustrates how the input speech signal is converted in the analyser section into the control voltages which command the VCAs. Photo 7b shows how the output signal is
synthesised, using a pulse generator as the excitation signal.
The second part of this article will contain a more detailed description of how a vocoder works, and will also take a look at the various applications of vocoders.
- Figures l, 2 and 3, photos 5, and 7:
- Sennheiser-Electronic, Wedemark, Hannover, West Germarny.
- Photos l and 2: .Silicon Systerms Inc., Irvine, California
- Photo 3: Electro-Harmonix, New York
- Photos 4 and 6: EMS, London .
VOCODERS (2) Elektor may 1978
As was mentioned in the first part of the article, the input speech signal is first converted into a set of data which will be used to control the synthesis of the output signal. The first stage in this process is to feed the speech signal to a bank of filters.
The channel filters split the signal to be analysed into a number of frequency bands which are spaced evenly over the audio spectrum. An identical bank of filters in the synthesiser section of the vocoder also divides the excitation signal up into the same number of frequency bands.
The filter stages of all currently available vocoders are in principle very similar. The filters themselves are of the bandpass type, whilst the only differences that exist are in the number of filters used. Figure 1 shows the frequency response curves for the filter bank of the Sennheiser VSM 20 1 Vocoder. In this vocoder the frequency range of 100 Hz to l0kHz is analysed into 20 separate channels using third-order bandpass
filters. The same frequency response curves are valid for the filter bank in the synthesiser section.
In the case of the `full-size’ EMS vocoder, the filter bank consists of 20 fourth-order bandpass filters plus one high and one lowpass filter, which cover a spectrum of 200 Hz to 8 kHz (the centre frequencies are spaced at intervals of 1/4 octave). In the simpler EMS 200 vocoder there are 18 filter channels, the roll-off slope of each filter being 18 dB per octave.
This unit, which is present in all three models already discussed, has the job of deciding whether the speech signal is composed of voiced or unvoiced sounds and whether, at any given instant, the oscillator or the noise generator should be used for the excitation signal.
The way this circuit works is interesting. In the case of voiced sounds, the low frequency components of the signal are predominant, whilst in the case of unvoiced sibilants the reverse is true and there is a greater proportion of high frequency components in the speech signal. These differences can be detected by
means of the circuit shown in figure 2 (this is the type of circuit used in the EMS vocoder), which consists of a high and lowpass filter feeding two envelope followers (filters preceded by a rectifier). The speech signal is therefore split into a higher and a lower frequency component, the amplitude characteristics of which are represented by the output voltages of the envelope followers. These are then compared, and depending
on whether the speech signal contains a greater proportion of higher or lower frequencies, the output of the comparator will swing high or low respectively. In the case of unvoiced sounds the LED also lights up to indicate the switch from oscillator to the noise generator.
An envelope follower is present in each channel of the analyser section. As already explained, their function is to derive the control voltages which will be used to n˛odulate the excitation signal. The output voltages of the envelope followers correspond to the varying amplitude levels of each channel of the input
signal, and thus represent a real-time spectrum analysis of the speech.
An example of a typical envelope follower circuit is shown in figure 3. An active full-wave rectifier is followed by a 6 dB lowpass filter. The break frequency is determined by the time constant R1 /C1 , and is in the region of 100 . . . 200 Hz.
Once again, all the above vocoders in corporate this useful facility. If no speech signal is presented to the vocoder input, as is the case during pauses in speech, then, naturally enough, in the absence of any control voltages there can be no output signal.
In order to prevent unpleasant staccato effects, silence bridging (sometimes known as `pause stuffing’ ! ) must be used. Depending upon the vocoder, a bridging signal, which is derived either from the original speech signal of from the excitation signal, and the amplitude, harmonic content and attack and decay times of which can be varied, is mixed into the pauses, thereby providing an audible output signal.
In the case of the large EMS vocoder, the connections between the output of the envelope followers and the VCAs are not fixed, but can be transposed at will, thus affording the possibility of producing some highly unusual and `weird’ sounds.
In both EMS vocoders nearly all the control voltages can be varied by externally derived command signals. The slew limiter shown in figure 4 (this corresponds to the portamento control in a music synthesiser) smoothes out the changes in control voltage, so that, instead of the pitch of the output signal varying in a series of discrete steps, it can be made to slide continuously up and down the scale in the fashion of a slide trombone. The same circuit also provides a freeze control, which, when activated by a switch, will sample the control voltage at any given moment and hold it constant.
The large EMS vocoder in particular contains a number of interesting additional facilities.
Mention has already been made of the two VCOs which can be played via an external keyboard, and these can also be used in conjunction with the
pitch extractor'. The latter is basically a pitch- to-voltage converter which functions by reading the glottal pulses of the speech signal. The control voltages from the output of the pitch extractor are fed to one or both of the VCOs, so that these follow the cadences of the speech signaI, whilst there is also aquality’ control which allows the pitch voltage to be exaggerated for special effects.
In addition, the large EMS vocoder includes a frequency shifter which can vary the frequency of the input signal over a wide range (+ 0.05 Hz to + 1000 Hz). In the case of the Sennheiser VSM 201 , the frequency shifter is available as an optional extra, and can be connected to either the speech- or excitation signal input.
Detailed block diagram of the VSM 201 Vocoder
By taking a detailed look at the block diagram of one particular vocoder, i.e. the Sennheiser VSM 201 , it should be possible to see just how the various functional units described above actually work together in practice.
Although at first sight the block diagram published in the first part of this article may not prove easily recognisable, at least the channel structure of the vocoder will be apparent from this drastically simplified ( ! ) diagram of the VSM 201 (see figure 5). The main difference between this and the earlier diagram is the presence of the additional blocks labelled Filter Controls’,Silence-Bridging Controls’ and Channel LevelControls’, plus the fact that in the VSM 201 the relative positions of the modulators (VCAs) and filters in the synthesiser section are reversed. The function of the filter controls is simple enough to explain: the output level of the 20 analyser filters can be varied by means of potentiometers PM 1 . . . PM20; the resulting signa’s can then be summed and fed direct to the vocoder output via switch SM. Thus by opening switch SV and closing switch SM the vocoder functions as a 20 channel equaliser – a useful facility for studio work. In addition, the filter controls and switch SM also allow anequalised’ version of the speech signal (i.e. the level of each channel can be varied
independently) to be added to the output of the vocoder (speech addition).
The controls PA 1 . . . PA 10 enable the control voltage from the silence-bridging detector to be varied. ‘Ihere is one PA-control for every two analyser channels. The silence-bridging control voltage is fed to the envelope followers, where it is added to whatever control voltages are derived from the input speech signal. In this way a control voltage is still presented to the modulators in the synthesiser section even when there is a gap in the speech signal, so that these pauses are filled out by the excitation signal.
The 20 control voltages produced by the envelope followers are individually accessible via external sockets, whilst their level is indicated by a row of LEDs – two facilities which prove extremely valuable when operating the vocoder.
The reversed order of the modulators and filters in the synthesiser section is for developmental reasons and does not affect the synthesis of speech by the excitation signal. Photo 1 shows the traces of a control voltage and the ensuing signals along the synthesiser channel, and it can be clearly seen that there is no difference between this photo and that shown in the first part of this article (photo 7) where the modulators followed the synthesiser filter bank.
The signal level of each synthesiserfilter output can be varied by means of the channel level controls PV 1 . . . PV20, whilst by means of switch SV the vocoding section can be cut out completely. The control PG determines the output level, whilst the bypass signal path, which is controlled by PB, allows either a portion or all of the signal from the input variable gain amplifier to bypass the entire vocoder and be fed direct to the output amplifier.
Inputs and internal signal sources
Line and microphone inputs are available for both the speech and excitation signals. In addition, there are two extra line inputs for unvoiced excitation signals which can be used in place of the internal noise generator.
As far as built-in sound sources are concerned, the VSM 201 includes a pulse generator with a frequency of approx. 150 Hz, which supplies an `internal’ excitation signal for test purposes.
The noise source which is used to synthesise the unvoiced portions of the excitation signal consists of a digital pseudo-random noise generator.
The voiced/unvoiced detector in the VSM 201 analyses the input speech signal by feeding the control voltages from channel 0 (a separate lowpass filter and envelope follower) and channel 19 (centre frequency of the filter 5.8 kHz) to a comparator. The output of the comparator triggers the switch between the voiced and unvoiced excitation signal (VCOs or noise generator).
The process used to generate the unvoiced portions of the excitation signal deserves some attention, since the amplitude and spectral composition of this signal must be matched to the voiced portions. To ensure the correct amplitude characteristics, an envelope follower derives a control voltage from the voiced portions of the excitation signal, and this is used to suitably modulate the noise signal. A pink’ filter, which can be switched in and out of circuit, is also included in the signal path of the unvoiced excitation signal, thereby allowing acolouration’ of the noise.
Pause-detection and -bridging
In the VSM 201 pauses in the input speech signal are detected by comparing the amplitude of the speech envelope with a variable reference level, the speech/pause threshold. An envelope follower monitors the peak amplitude of the speech signal, the resultant control voltage being fed to a comparator where it is compared against the preset speech/pause threshold voltage. The output of the comparator gates an ana-
logue inverter which in turn provides the silence-bridging control voltage. The latter consists of the envelope voltage of the speech signal fed through a logarithmic amplifier.
Thus as soon as the comparator detects a pause in the speech signal, its output changes state and the full silence-bridging voltage takes over. The fact that the bridging control voltage is derived from the envelope voltage of the speech signal ensures that the level of the bridging signal corresponds to that of the speech signal, thereby preventing obvious jumps in the output level.
The silence-bridging circuit can be switched in and out by means of SA, whilst the inverted and non-inverted waveform from the output of the speech/pause comparator is available at external sockets. The presence of the latter waveform is indicated by a LED. Similarly, the envelope voltage of the speech signal is brought out to a socket for other control purposes.
It is clear that the range of possible applications for the vocoder go far beyond the synthesis of speech; its musical potential however, is only now beginning to be fully appreciated. The most obvious application of vocoders is in the field of modern electronic music, and indeed a number of well-known artists and groups (e.g. Pink Floyd, Tangerine Dream, The Who etc.) have already recognised the enormous musical potential of vocoders. The versatility of the vocoder stems largely from the wide variety of different mu-
sical instruments with which it can be interfaced, and it is the ability of the vocoder to modulate the sound of `conventional’ instruments such as organs, guitar, drums etc., thereby providing totally new tonal possibilities, which lends the vocoder its unique character. It therefore seems likely that, in years to come, the vocoder will play a permanent role in the production of electronic music, especially when used in conjunc-
tion with a music synthesiser.
Vocoder and music synthesiser
When a vocoder is linked to a synthesiser, the tonal possibilities are virtually endless, since in a sense the two instruments are complementary. Despite the considerable versatility of a synthesiser, many musicians feel that it would be nice to have more control of the synthesised sound, e.g. be able to modulate the synthesiser signal with the variety of sounds which can be obtained from conventional musical instruments.
To realise this, the synthesiser requires additional circuitry to analyse the externai signal and convey its musical parameters to the synthesiser, i.e. a pitch to voltage converter to extract the melodic content, a vocoder to determine tone colour, and an envelope follower to control the amplitude characteristics of the synthesized signal.
The pitch-to-voltage converter, which can be viewed as the reverse of a VCO, enables the VCOs in the synthesiser to follow the frequency of an external input signal, such as e.g. that of an electric guitar. One is therefore no longer restricted to the compass of the keyboard, and the synthesiser can be ‘played’ by other musical instruments, and even by the sound of the human voice.
The vocoder tailors the harmonics of the synthesiser VCOs in a manner which is dependent upon the harmonic content of the instrumental or speech signal, so that feeding the output of the syntesiser VCOs to the excitation input of the vocoder results in it aquiring a similar tone colour to that of the signal fed to the speech input. The VCO waveforms which are rich in harmonics, e.g. the saawtooth and squarewave, are particularly suitable excitation signals for the vocoder, since their spectrum is sufficiently broad to reproduce most of the changes in harmonic content of the speech signal. The vocoder can be incorporated as a module into the synthesizer, replacing the position of the VCFs in the signal path.
Finally envelope followers can be used to vary the amplitude characteristics of the synthesizer signal in accordance with those of the external speech or guitar signal, so that the two will have a similar attack and decay etc.
The combination of a large synthesiser and the above three devices opens up a world of virtually limitless musical possibilities. For example by restricting the synthesiser to the frequency range of the human voice, conventional instruments can be made to sound as if they are being played by a synthesiser – a particularly impressive effect if the sequence from the synthesiser is very fast. Another idea is to let the pitch of
certain synthesiser VCOs follow the chords of e.g. an electric guitar which are spaced at intervals of say an octave, whilst others produce a continuous choral effect, this being made to `sing’ a spoken text presented to the speech input of the vocoder.
Although these are only examples, they appear to justify the conclusion that the combination of synthesiser and vocoder finally offers what many synthesiser manufacturers have claimed: namely the ability to produce a virtually infinite variety of different sounds.
General artistic applications of vocoders
The applications for a vocoder are, however, by no means limited to the sphere of the recording studio and its use, in conjunction with a synthesiser, for the creation of electronic music. It also represents a versatile special effects unit which can be employed in radio and live drama as well as films to produce the impression of `talking’ objects, for instance, or simply to vary the sound of the human voice.
The non-realistic and slightly ‘other wordly’ nature of vocoded speech lends itself particularly to applications such as sci-fi and children’s films or plays, where the elements of phantasy and
imagination are predominant. Indeed it may even prove to be in this area of artistic use that the vocoder finds its most important application.
To summarise briefly therefore: as a result of the efforts of Sennheiser and EMS, the vocoder, which has been used for a number of years in the field of telecommunications, has been developed into a highly versatile and sophisticated instrument for the production of electronic music and special effects. Its basic mode of operation is to analyse any signal within the frequency range of the human voice (normally a speech signal) and impose the most important parameters of that signal (amplitude, changes in the harmonic content, and variations in pitch) upon a second (excitation) signal. In this way it is possible to make the excitation signal
speak' orsing’ with a remarkably clear and differentiated articulation.
From a technical point of view (noise performance, distortion etc.), the above vocoder models all satisfy the requirements for studio work, and together form a comprehensive range suitable for all possible applications. A particularly attractive feature is their relatively compact size (with respect to the amount of circuitry they contain) and extremely ergonomical layout, so that the prospective user is not deterred by a confusion of controls which take an age to master.
The vocoder allows the user to mix music, speech and sounds together in a totally new way, the resultant effects being characterised by their highly original and `fantastical’ nature.
- Funk, H.: Kunstliche Stimmen aus dem Vocoder? Fachblatt-Music-magazin, Mai1977,pp 47…50.
- Condron, N. and Ford, H.: EMS Vocoder – an operational assessment. Studio Sound, July 19 77, pp. 96 . . . 98.
Acknowledgements: Photo l, Figures 1 and 5: Sennheiser Electronic, Wedemark, Hannover.
VOCODER TODAY F. Visser (Elektor december 1979)
When we first discussed vocoders in Elektor, a few years ago, they were still relatively unknown. Since then, interest in this type of sound-effect system has grown at an astonishing rate. Especially where the popular
music vocoder is concerned, the number of different manufacturers and types seems to be increasing exponentially and the end is nowhere near in sight.
There is every reason, therefore, to take another look at the vocoder phenomenon – especially since we have now reached the point where we can describe a vocoder circuit specifically designed for the home constructor! More on that next month; first, we will recap the background and basic principles of vocoders briefly, so that everyone knows what we’re talking about.
It¥s not surprising that vocoders have become so popular in such a short time. Certainly in the popular music field, where interest in all kinds of artificial effects has increased rapidly over the last few years. Add to this the undeniable fascination of anything associated with artificial speech production (nothing new: this has been going on for centuries!) and you have two solid foundations for this vocoder.
Although artificial speech production is not really a job for a vocoder, the first experiments in that direction can still be seen as the earl iest stage of vocoder history.
A Mr. von Kempelen was the first to experiment successfully in this field. Around 1790, he produced a complicated machine consisting of an amazing array of bellows, membranes, resonators and pipes. Believe it or not, it produced ‘human speech’ sounds!
At the beginning of this century, Stewart succeeded in constructing the first electrical synthesiser of simple simple speech sounds. This speech synthesiser inspired Homer Dudley, at the Bell labs in the United States; his invention was patented in 1936. He called his speech analyser/synthesiser a ‘Vocoder’ – from VOice enCODER-decoder. This vocoder was intended for transmitting speech over a transmission link with the smallest possible bandwidth. Purely for telecommunications, in other words. Inevitably, the military showed great interest in the vocoder. Not only did it have the advantage of requiring only a narrow transmission bandwidth; it also offered the possibility of speech coding – ‘scrambling’.
Around 1950 one of the first musical applications of the vocoder, the ‘talking piano’ , appeared on a gramophone record (‘Sparky’). The effect was exceptionally effective, certainly when one considers the state of the art at that time, but is was accepted without a stir. It was merely another byproduct of the ‘mysterious art of electronics’. The same casual, if mystified, acceptance was widespread when Radio Luxemburg first introduced their well-known jingle, and again when the Beatles used an EMI vocoder to produce some extremely sophisticated effects.
It wasn’t until 1975 that the mystery surrounding the vocoder started to dissolve. Until then, it had been used
only in a few large laboratories (Bell, Siemens, EMI, Philips, Sennheiser). With good reason: those vocoders were so big that some of them filled a whole room .
It is interesting to compare the development of the vocoder with that of the computer. The latter was initially seen as a rather frightening and very powerful machine. Only 25 years ago, it was thought that two computers would suffice for the whole of the United States: one on the East coast and one on the West coast. In fact, we are now rapidly approaching the point where there will be a computer in every home! It is unlikely that the popularity of vocoders will go quite that far. However, like earlier ‘revolutionary’ inventions (railways, cars, computers, electronic music synthesisers), it is likely that it will become far more commonplace
than was originally expected. Speech analysis, speech synthesis, speech recognition, speech input and output for computer systems, and – last but not least – applications in (electronic) music: vocoders are used in all these fields, and the end is nowhere near in sight.
What’s on the market?
1975 can be considered a turning-point in the history of the vocoder. In that year, a British manufacturer of music synthesisers and similar specialised equipment introduced a vocoder designed by Tim Orr. EMS was already known as a company with ‘vision’; it was one of the leaders in the field of electronic music. In this case, they were again the first to launch a completely new instrument: the vocoder.
It is outside the scope of this article to analyse the marketing philosophy of all present-day manufacturers of vocoders, but a single example may serve to illustrate the confusion and hesitation – both on the part of the manufacturers and on the part of musicians – which has become apparent since the EMS Vocoder first appeared. Dr. Robert A. Moog, the ‘father’ of the music synthesiser, first built a channel vocoder in 1970. It cinsisted of a multitude of filters, envelope followers and voltage controlled amplifiers, and it was used for an adaptation of a Beethoven chorale by Walter Carlos for the film ‘Clockwork Orange’. At the time, Moog apparently failed to see any commercial future for a more practical version of this device. It wasn’t until
the fearfully expensive EMS vocoder appeared that a few other manufacturers suddenly showed interest (Sennheiser, Synton, Bode). This forced Moog to face facts: his extensive range of products was incomplete without a vocoder.However, the presently available Moog vocoder is not his own design: it is manufactured under licence. The rights belong to Harald Bode, who has had his own (patented) vocoder on the market for some time. This patent will be discussed later.
The growing competition and falling prices since 1975 are clearly illustrated in figure 1 . The last two years, in particular: a new manufacturer – or a new type, at least – every few months! For those who are more interested in price than in date of introduction, the available types with approximate prices are listed in table 1 .
The first large vocoder systems on the market (EMS Vocoder, Sennheiser VSM 201, Syntovox 221) were aimed at the ‘high end’ of the market. They were expensive – well above the means of musicians or even small sound studios – and so complicated to operate that it was difficult to attain high levels of artistic achievement . . . Their use was limited to large studios, radio stations, film studios and a very few well-known
pop groups or composers with their own studio. Furthermore, a system that offered good intelligibility and speech precision was useful for speech research.
A large potential market remained unexploited: the musicians and groups who are always on the look-out for new effects, a new ‘sound’. It was to be expected that Japan would be the first to introduce a vocoder at a price that the average musician could afford. It was to be expected . . . but it didn’t happen!
In November 1978, at an Audio Engineering Society exhibition in New York, the American manufacturer Electro Harmonix introduced a vocoder system priced at about 800 dollars. Admittedly, a Japanese manufacturer (Korg) also had a vocoder on show – but it was much more expensive. Both of these
vocoders were quite obviously rush jobs, and the commercial departments were unexpectediy faced with the task of explaining this highly complex unit to a very broad group of potential customers. To make matters worse, the few people who did know anything about it by and large failed to realise its full potential: they were interested mainly in the ‘talking music’ effect. There is, however, a completely different field of applications for the vocoder: speech training for the handicapped. Speech sounds, or even complete words,
can be produced by a vocoder. These can serve as an example for the learner, and his own attempts can be compared with the original.
A further, possibly highly important, application of vocoders is in ‘expression training’. Modifying sounds by making other (vocal) sounds often proves to have a most beneficial effect for those who join in this kind of (group) therapy. The most interesting – and funny – effects are obtained when one succeeds :in overcomming initial inhibitions, when faced with a group.
A vocoder offers the possibility of superimposing speech characteristics onto the sound of a musical instrument (Electric Light Orchestra, Herbie Hancock) or any other basic sound. But there is more. It is also an ideal aid for modifying the timbre of a sound, for instance by superimposing vocal ‘colouration’.
There are a few restrictions that must be considered. Two points in particular limit the choice of sound sources. In the first place it is essential that the two sounds occur simultaneously – vocoding is a ‘live’ process – and furthermore the spectra of the two sound sources must overlap as much as possible. Some examples are given in figure 2 and 3. Colouration of the sound from a musical instrument is not the only possibility. The loudness of the final output is also determined by the loudness of the speech signal. This can be extremely
useful in itself. The attack and decay of the musical sound can be varied by these singing louder or softer; instruments that would normally have a relatively slow ‘attack’ can be made more percussive by vocalising the desired ‘explosive’ effect; chords played on an organ, polyfonic synthesiser or by a string ensemble can be coloured and rhythmically articulated by singing short tones at the desired pitch.
Obviously, all this calls for some practice. The musical effects that can be obtained by means of a vocoder
depend entirely on the vocal capabilities (and the long wind!) of the vocoder player.
One of the most important characteristics of the vocoder in musical applications is that it is a kind of interface
between the musician and the musical instrument. A vocoder is an ideal aid to musicians who wish to achieve a personai ‘sound’, a unique ‘signature’, in their performance. The musician has a ‘real time’ tool that he can use to modify the complete tonal structure immediately, while he is playing. He can make the sound harsher, fuller, softer, more percussive. The results are immediately obvious, so that a kind of feedback mechanism occurs: the musician can hear exactly what he is doing and modify his vocal control accordingly.
The result, as far as ‘playing’ the instrument is concerned, is similar to playing a conventional instrument; for example, the light touch on a keyboard instrument or the precise lip control and embouchure for wind instruments. In these cases, the final result is also determined by a similar ‘feedback’ mechanism. It is worth nothing that this effect is almost absent when playing other electronic instruments, since the
programming, presets and so on can only be modified by means of a separate hand or foot control. This control does not lend itself to such immediate and precise control of the total sound, with the result that it is extremely difficult for the musician to produce exactly the desired effect.
Designing a vocoder
It is no easy matter to design a vocoder that is suitable for (mass) production. Before going into the problems, however, it is essential to take a closer look at the basic principles involved. For a more extensive discussion, readers are referred to the two articles on vocoders in the April and May 1978 issues of Elektor. In this article, we will keep the explanations as brief as possible.
Basically, then, a vocoder consists of two groups of identical filters; one of these is used to divide the speech spectrum into narrow bands, from each of which a voltage is derived that can be used to control the other group of filters, which reconstruct the speech spectrum. This would seem rather pointless – using speech to make speech – but the difference is that the second group of filters receive a completely different input signal as a basis for the reconstructed speech. The first group of filters is the ‘analyser’ section, the second is the ‘synthesiser’. The input signal to the synthesiser section is called the ‘carrier’, ‘excitation’ or ‘replacement’ signal.
As the block diagram in figure 4 shows, the analyser section is basically similar to a graphic equaliser, with one major difference: the outputs of the various filters are not summed. Each is followed by its own rectifier and low-pass filter; together, these form an envelope follower. In this way, an audio signal can be converted into a set of control voltages (Vc) for driving the synthesiser section.
The second group ˛f filters, the synthesiser section, could also consist of a graphic equaliser (figure 5). In this case, each of the filters is followed by a voltage controlled amplifier; the outputs of these VCAs are summed to produce the final output. This system, in its simplest form, would seem to fulfil the requirements for a vocoder. In all probability, the results obtained would indeed be faintly reminiscent of the real thing . . . However, intelligibility and dynamics would leave a lot to be desired.
Numerous tests and intensive investigation have led to a list of requirements, relating to the various sections of the block diagrams discussed above. The exact requirements depend to some extent on the application for which the vocoder is intended.
In general, if vocal sounds are to be superimposed on some other sound, filters covering the range from 300 Hz to 3 kHz will usually suffice. Obviously, using more filters and covering a larger total bandwidth will lead to better ‘definition’. The large EMS, Sennheiser and Synton vocoders use about twenty filters, covering a range from approximately 200 Hz to 8 kHz. Within this range, bandpass filters are used for both analysis and synthesis. Frequencies below 200 Hz and above 8 kHz are covered by a low-pass and a high-pass filter, respectively, so that the complete audio band from 30 Hz to 16 kHz is processed by the vocoder.
When a large number of filters are used, deciding how to subdivide the audio band is no real problem. However, in this case design of the filters is critical: a fairly narrow and well-defined pass-band is required, and the centre frequencies must be accurate. In large vocoders, like those mentioned above, it is customary to use third-octave filters (or an approximate equivalent). Vocoders that use less filters must obviously use a wider spacing of the centre frequencies – the same total range must be subdivided into fewer pass-bands. Furthermore, the filters may cover different bandwidths, giving more precise analysis and synthesis
in the frequency range that is important for speech intelligibility.
T˛e number of filters used (and the spacing) determines the required bandwidth and the filter steepness outside the band. If filters are set close together but with an insufficiently steep cut-off, there will be a large frequency overlap. The result is that the speech becomes indistinct and ‘woolly’. This will almost invariably happen if two graphic equalisers are used, as suggested in the basic example given earlier. Equaliser filters
are just not good enough for this application.
The easiest and cheapest way to obtain a filter with a sharp cut-off is to use a gyrator, but this has other drawbacks. This type of circuit tends to ‘ring’ noticeably and unwanted frequencies do leak through; both of these effects severely affect the intelligibility. We could go on like this, crossing off the various types of filter, but there is litte to be gained by beating around the bush: in practice, there is really only one filter type that is suitable. As you would expect, it is by no means the cheapest.
For optimum intelligibility, the initial slope of the filter should be in ihe order of 50 . . . 54 dB/oct. This type of filter is used in the Synton Syntovox 221 . Regrettably, the large number of close-tolerance components required precludes its use in low-cost vocoders. The Sennheiser VSM 201 , for instance, uses 36 dB/octave filters; in the large EMS vocoder, about 30 dB/oct. is used. The high price of professional vocoder systems is a direct result of the high component and assembly costs involved in the large number of high-precision filters.
But good filters aren’t the only problem. In the analyser section each filter must be followed by an envelope follower, consisting of a precision rectifier and a low-pass filter. Output offset voltages are the headache here: they can ruin the dynamics of the whole system. There are only two alternatives: either use very carefully selected components or else include a calibration facility. Another point to watch is the cut-off frequency of the low-pass filter. It’s not a good idea to use identical filters: the cut-off frequency should be related to the centre frequency of the corresponding analyser filter.
Hold on: we’re not out of the woods yet. Things get worse before they get better; the synthesiser section poses
even more problems. Each filter in the synthesiser section must be followed by a voltage (or current) controlled amplifier. If you draw up a list of all the ways to make a voltage controlled amplifier (VCA),
the OTA (operational transconductance amplifier) turns out to be the best bet. This is not to say that it is ideal – it most definitely is not. The transconductance (gm) tolerance is bad enough, but there are two more problems. In the first place, OTAs are noisy. They hiss. This is not quite fair, perhaps – there are other noisy opamps – but the problem is that only very low signal levels can be used if the distortion is to be kept within reasonable limits, so the signal-to-noise ratio suffers.Furthermore, the signal leakage from control input to signal output is often considerable. Not that you can blame the manufacturer of the OTA (CA 3080/ : this leakage is not included in the specifications, and in most applications it is relatively unimportant. For a
vocoder, however, it is essential that this leakage is minimal; otherwise the control signals from the analyser can break through to the output, even in the absence of a ‘carrier’ signal. This is a nuisance, to put it mildly . . .
As before, the solution is to either select the components carefully or else provide a calibration point. For really good results, you really have to do both. In the constructional project that will be described next month, a large number of adjustments are included for this reason; even so, a test procedure to reject really ‘bad’ OTAs will improve the final performance.
So far, we have only considered the most essential parts of a vocoder system: the analyser and the synthesiser.
Using these two, speech sounds can be superimposed on other signals. Some speech sounds, that is: the so-called ‘voiced’ sounds (vowels, for example). Complete speech synthesis, including ‘unvoiced’ sounds (s, f, p, and so on) is not possible with this basic system. For this, a noise generator and a voiced/unvoiced detector are required; the latter, in particular, is quite a complex circuit. It is the intention to describe it in greater detail at a later date. However, if the vocoder is to be used for musical applications, the basic system discussed
so far is perfectly adequate. For that matter, most low-cost vocoders presently available also lack a voiced/ unvoiced detector, mainly for reasons of price.
If the vocoder is used in conjunction with musical instruments that produce a broad spectrum, with plenty of
higher harmonics, a reasonable approximation of the unvoiced sounds will be obtained without a voiced/ unvoiced detector and associated noise generator.
A search through the files in the patent office shows that there are hundreds of patents directly related to the vocoder, and even more that have some bearing on it: patents in areas iike speech recognition, detecting the fundamental speech frequency, etc.
The most recent patent relating to vocoders is in the name of Harold Bode, the manufacturer of the Bode vocoder (that is also manufactured under licence by Moog). The main point in this patent is a clever little trick that Bode uses in his vocoders to increase the intelligibility of speech – the filters used in the vocoder have a slope of only 24 dB/octave.
As explained earlier, the intelligibility of synthesised speech depends on the type of filter used: its general performance, and the slope outside the passband. If a vocoder is not intended for speech synthesis in the full sense – where external control voltages can be used to create intelligible speech – then the intelligibility for musical applications can be improved by adding the high frequency portion of the speech signal (above 3 kHz) to the output signal from the voeoder. This high frequency signal only contains the noise signal and
transients for consonants like , p and t.
The main disadvantage of this system is that a real voice must be used to drive the vocoder: if artificial control signals are used, the high frequency content will be missed in the output. Furthermore, this ‘high frequency bypass’ system produces a similar effect to ‘signal breakthrough’ in the vocoder. Despite these disadvan-tages, the effect is interesting enough; it is worth experimenting with when you are building your own vocoder.
It is difficult to estimate future developments in vocoders. At present, it seems unlikely that a digital version will be produced. The conventional analog vocoder has the unique feature that it works ‘real time’. The incoming signal is analysed immediately, and the output from the analyser can be used for simultaneous synthesis. In spite of the problems involved in using sharp analog filters (phase shift), it seems unlikely that a digital alternative with a reasonable price will be found in the near future. Synthesising speech arti-
ficially is another matter, of course. There are several digital approaches to this. The problem facing the would-be digital vocoder constructor is to analyse complex signals, like speech, sufficiently rapidly and accurately to make a workable vocoder.
The popular music vocoder has a bright future. The number of manufacturers and types will increase rapidly, and this is bound to lead to falling prices. However, it is unlikely that the near future will see vocoders in the same price range as ‘effect boxes’. A vocoder is too complex for that, using large numbers of close-tolerance components if optimum performance is required. That, and the number of man-hours required to build one unit, precludes the appearance of a mass-produced lowcost vocoder for some time to come.
It is to be expected that vocoders will be incorporated in electronic organs in the not-too-distant future. In a few years time, most organs should have a ‘vocoder’ button – offering one of the most intriguing and creatively-inspiring effects of our time at the touch of a finger!
What of the near future? Next month? That, at least, can be foreseen with great certainty: for the first time, as far as we know, a vocoder designed specifically with the constructor in mind. Build your own vocoder!
- Elektor, April and May 1978: Vocoders.
- Elektor, January 1978: Elektor Equaliser.