Dissertation Proposal, Composition DMA, Columbia University

Timothy D. Polashek

 

 

Speakings IV: Composition for 16 Voices

 

The dissertation is divided into two related parts:

 

1)     A vocal composition for an ensemble of 16 vocalists (about 20 minutes in duration).

 

2)     A written document describing 1) original software developed for the synthesis of text, 2) the theoretical foundations of these algorithms, and 3) a brief discussion of the composition.

 

The music written for these vocalists will call for the speech-like production of nonsensical phrases and sentences in English.  Natural speaking rhythms will be countered with precise utterances notated in traditional music notation. In performance, a conductor will direct the overall balance of the score, but each of the 16 vocalists will have their independent parts guided by their own specific pre-recorded click track, as heard through headphones.

 

Musically, there are two broad areas I will pursue with this composition.  The first is to explore the textural and aesthetic continuum made possible by assembling a large ensemble of vocalists who use their natural speaking voices to produce sound.  One extreme in this continuum, for example, could be the unison chanting of a text by all voices.  The opposite of this monophonic texture might be the aggregate sound of many unrelated conversations simultaneously occurring, like the sound of an excited audience recorded on tape before the start of a concert.  From this point in the continuum, movement along another axis could lead to a texture with the same independent conversational rhythms, but the sentences might gradually contain fewer words, although with more syllables, or the vowel content of these words might be gradually filtered, leaving only sentences containing, for instance, vowels with the “ee”  phoneme, such as “Leave me please see receipt.”  Another orchestrational possibility could be to have eight vocalists speaking slowly in a sustained artificial monotone, while another seven voices respond with loud short words responding to the call of a leader, the remaining voice of the sixteen total, performing traditionally notated rhythms and pitches, evoking the sound and theatrical image of a livestock auction at a county fair.  These theatrical emulations of speech interactions between people in various social contexts will be contrasted by textures possessing more abstract musical structures, resulting in vocalisms that are perceived as less like traditional conversational speech and more like music written for percussion instruments but whose reports and timbres are actually words and syllables.

 

The other musical objective is to define text synthesis algorithms, to implement them in a software environment (C language compiled on a Linux computer workstation), and to explore the musical utility of these algorithms as the composition progresses, modifying and creating new algorithms as needed.  These algorithms will perform extensive searches and analyses of an electronic phonetic dictionary that contains over 125,000 words, which were compiled and encoded by the Speech and Computer Science Group at Carnegie Mellon University.  Most of the developed text synthesis algorithms will be of the “analysis and resynthesis” philosophy, operating on the phonological content of syllables, words, and phrases.  Because the CMU pronunciation dictionary breaks each word entry into phonemes and assigns stress rankings to vowel phonemes, the created algorithms will seek hierarchical stress relationships between syllables of words, and between words and phrases, up to the sentence level.  Text synthesis without a provided model source text will be possible based on abstract inputted hierarchical data alone.  This parameter paradigm will allow the generation of words and phrases catering to specific localized rhythmic and other musical needs of the composition.  Generation algorithms will also allow for a preference of specific features.  For example, one algorithm might be directed to favor “k” and “t” consonants in a the synthesis of a phrase, but forcing every third syllable to possess a diphthong.

 

Throughout my studies at Columbia University, a great deal of my compositional and theoretical work has been in the area of text/sound music and algorithmic composition.  Seven of my pieces, “Beyond Babble: The Synthesis of Three Well-Known Texts”, “Speakings I” (score for eight voices), “Speakings II: Kinetics” (recording for quadraphonic sound diffusion), “Speakings III: Headlines”  (for two voices), Dry Weaves (for stereo tape), and my interactive compositions, “Terminal Jamb” and “Tinkering with Speech,” all used the spoken voice, exclusively or primarily, as their sonic material.  Also, all of these works were composed with the aid of my original software, typically operating on the acoustical and phonological properties of words and intended to facilitate the creation of musical structures out of English text, independent of linguistic meaning.  This compositional paradigm enables the musical manipulation of the degree of semantics, itself, present in spoken speech: in other words, the degree of perceived meaning as opposed to the mere sound of speech.  This ‘musical manipulation’ is at the heart of my artistic expression, and my dissertation is ultimately a furthering of these specific musical interests.