Dissertation
Proposal, Composition DMA, Columbia University
Timothy D. Polashek
Speakings IV:
Composition for 16 Voices
The
dissertation is divided into two related parts:
1)
A
vocal composition for an ensemble of 16 vocalists (about 20 minutes in
duration).
2)
A
written document describing 1) original software developed for the synthesis of
text, 2) the theoretical foundations of these algorithms, and 3) a brief
discussion of the composition.
The
music written for these vocalists will call for the speech-like production of
nonsensical phrases and sentences in English. Natural speaking rhythms will be countered with precise
utterances notated in traditional music notation. In performance, a conductor
will direct the overall balance of the score, but each of the 16 vocalists will
have their independent parts guided by their own specific pre-recorded click
track, as heard through headphones.
Musically,
there are two broad areas I will pursue with this composition. The first is to explore the textural
and aesthetic continuum made possible by assembling a large ensemble of
vocalists who use their natural speaking voices to produce sound. One extreme in this continuum, for
example, could be the unison chanting of a text by all voices. The opposite of this monophonic texture
might be the aggregate sound of many unrelated conversations simultaneously
occurring, like the sound of an excited audience recorded on tape before the
start of a concert. From this
point in the continuum, movement along another axis could lead to a texture
with the same independent conversational rhythms, but the sentences might
gradually contain fewer words, although with more syllables, or the vowel
content of these words might be gradually filtered, leaving only sentences
containing, for instance, vowels with the “ee” phoneme, such as “Leave me please
see receipt.” Another
orchestrational possibility could be to have eight vocalists speaking slowly in
a sustained artificial monotone, while another seven voices respond with loud
short words responding to the call of a leader, the remaining voice of the
sixteen total, performing traditionally notated rhythms and pitches, evoking
the sound and theatrical image of a livestock auction at a county fair. These theatrical emulations of speech
interactions between people in various social contexts will be contrasted by
textures possessing more abstract musical structures, resulting in vocalisms
that are perceived as less like traditional conversational speech and more like
music written for percussion instruments but whose reports and timbres are
actually words and syllables.
The
other musical objective is to define text synthesis algorithms, to implement
them in a software environment (C language compiled on a Linux computer workstation),
and to explore the musical utility of these algorithms as the composition
progresses, modifying and creating new algorithms as needed. These algorithms will perform extensive
searches and analyses of an electronic phonetic dictionary that contains over
125,000 words, which were compiled and encoded by the Speech and Computer
Science Group at Carnegie Mellon University. Most of the developed text synthesis algorithms will be of
the “analysis and resynthesis” philosophy, operating on the phonological
content of syllables, words, and phrases.
Because the CMU pronunciation dictionary breaks each word entry into
phonemes and assigns stress rankings to vowel phonemes, the created algorithms
will seek hierarchical stress relationships between syllables of words, and
between words and phrases, up to the sentence level. Text synthesis without a provided model source text will be
possible based on abstract inputted hierarchical data alone. This parameter paradigm will allow the
generation of words and phrases catering to specific localized rhythmic and
other musical needs of the composition.
Generation algorithms will also allow for a preference of specific
features. For example, one
algorithm might be directed to favor “k” and “t”
consonants in a the synthesis of a phrase, but forcing every third syllable to
possess a diphthong.
Throughout
my studies at Columbia University, a great deal of my compositional and
theoretical work has been in the area of text/sound music and algorithmic
composition. Seven of my pieces,
“Beyond Babble: The Synthesis of Three Well-Known Texts”,
“Speakings I” (score for eight voices), “Speakings II:
Kinetics” (recording for quadraphonic sound diffusion), “Speakings
III: Headlines” (for two
voices), Dry Weaves (for stereo tape), and my interactive compositions,
“Terminal Jamb” and “Tinkering with Speech,” all used
the spoken voice, exclusively or primarily, as their sonic material. Also, all of these works were composed
with the aid of my original software, typically operating on the acoustical and
phonological properties of words and intended to facilitate the creation of
musical structures out of English text, independent of linguistic meaning. This compositional paradigm enables the
musical manipulation of the degree of semantics, itself, present in spoken
speech: in other words, the degree of perceived meaning as opposed to the mere
sound of speech. This ‘musical
manipulation’ is at the heart of my artistic expression, and my dissertation
is ultimately a furthering of these specific musical interests.