Every hair on my body stands on end as the sound rushes into me, tingling like soda water towards the crown of my head, then impatiently spreading out towards the end of my spine, tugging on my hair, coaxing me to sway with the sound waves. My perception and concentration suddenly sharpen, my body magnified yet shrunk — I seem to have become one with the vibrating particles in space, merging into space itself. This was how I felt the first time I heard “Semicircle” by Alvin Lucier, on his album Works for the Ever Present Orchestra.
Rather than “music,” the piece is better described as a practice session that amplifies the senses and condenses time. Upon listening closer, I especially notice that, when the orchestra’s instruments of different timbres simultaneously play the theoretically identical musical notes, as instructed by the composition’s concept, there is a variation in the depths of resonance and their lingering, emergent overtones. Through their overlapping and merging, dynamic oscillations fill the room.
They remind me of different protagonists in a scene who repeat the same words but in different tones, scrambling to share the closest definitions under the same denotation. The back and forth movement of these slight aural differences evokes visual associations: a wavering halo of light that appears when someone with astigmatism squints to identify an object; a negative afterimage that remains in your eyes after staring at a light source for a long time. Imagine walking a tightrope while holding a shallow dish filled with water, swinging from side to side, like the sounds which sway between a certain kind of correctness and incorrectness. As when tuning an instrument, the same note is repeatedly played and adjusted— tightened, loosened, then tightened again, just to reach the perfect middle line in the swinging of the Hertz meter on the tuner — an extreme game of “learning” over and over again. When playing together, the different voices become monotone.
Accents, the National Language Policy and Centralization
Such detailed color grading in brightness, contrast, and shading are also present in speech: for example, in the German language, there exists not only accents from Hamburg and Berlin, but also Turkish, Korean, Taiwanese and other accents. They are like different color temperatures painting the flames of a burning fire. Speech is the foundational tool for integration into a society, and also functions as a mobile recording device through which knowledge is gained and shared, shaped and standardized. Beginning with the oral tradition of storytelling, the foremost requirement to enter a social group is understanding and responding to sound. Gilles Deleuze has made the example of how infants first learning to speak are expected to make the sounds “dada” and “mama,” illustrating how “vocalization” is about establishing a domain of relations. To speak, or not to speak, is an order and demand, and this sheds light on processes of language acquisition, socialization, and power relations.
Growing up in Taipei in the mid-1980s, I was not directly affected by the National Language Policy, nor the earlier colonial period of Japanization. It was not until I was living in a foreign country with a lack of language ability did I instinctively come to realize that the voice is a critical determining factor in life abroad. I suddenly became aware of the connotations that an “accent” carried, the social judgment that it “needed to be corrected,” and the speaker’s hope to reach the “standard” — representing not only their desire to communicate efficiently, but also to abandon their background in order to possess a “pure mother tongue” and appear as an “upper class intellectual.”
Before the era of global mobility, topography limited the movement of different ethnic groups, with geographical boundaries often determining the borders of nations and language families. However, once international movement became a daily reality, the coinciding of national boundaries with the homogeneity of language and ethnicity, gradually shaped the nationalist imagination and became a strategy for control by those in power. The voice’s standardization became an aggressive form of governance. Out of the ascent of the term “nation” in Paris in the 16th century, the concept of the Nationalsprache (national language) was adopted in many nation-states by the 18th and 19th centuries. Enforced under the banner of establishing a clear system of communication, it was in fact a construct in order to intensify domination.
This concept differs from that of the “sound community” or “sonic territory” which emerge from natural geographical relations; in contrast, a National Language Policy aims to homogenize the linguistic norms, expectations and values of its national subjects by establishing a completely integrated speech community/linguistic community. This imaginary conflation of culture and nation is also the work of colonialism, in its marginalization and suppression of minorities.
Similarly, in 1955, the International Organization for Standardization (ISO) fixed the note “A” at the absolute frequency of 440 Hz. This process of standardization, which overreaches the objective of orchestral harmony, can perhaps reflect the desire for the ideal of total control. When the national language is considered as the absolute pitch, and accents as discordant sounds, then tuning instruments — that is, removing accents — can be described as physical and social domestication, the naturalization of an absolutist system.
This centralization of the voice, which restricts the potential of plural accents, is a form of control in which systems of power exploit vocal and auditory inertia to homogenize human expression. Another common tactic of governance is the use of sound as a tool for affective manipulation. Specific sounds that evoke emotions, memories, and values are deployed to penetrate listeners with political ideologies and market economy values. TikTok, for instance, provides its clip creators with a selection of soundtrack music templates (recently trending are Miley Cyrus’s “Flowers,” Coi Leray’s “Twinnem,” or Meghan Trainor’s “Made You Look”). This results in the repetitious linking of pop culture trends with memory, a brainwashing which reaps real economic profit for the entertainment industry (and also benefits the surreptitious data collection of national security intelligence agencies).
When Taiwanese people hear the song “Effort is the Only Road to Success” (愛拼才會贏), they will associate it with their school days, filled with the education system’s disciplinary phrases that upheld physical labor, such as “Man will triumph over nature” and “Hard work pays off.” Often played during election campaigns, the song will move crowds to shout about the “Taiwan Miracle” at the top of their lungs.
Another example is the Kuomintang’s propaganda broadcasts during the Cold War, where their army hoped that Teresa Teng’s sweet songs would arouse the people on the opposite side of the Taiwan Strait to yearn for a free and affluent life of comfort, thus convincing them to defect to the Republic of China (ROC). Correspondingly, this strategic manipulation of sound to evoke contextual associations and projections was also mirrored by the Communist Party’s own propaganda broadcasting, as imprinted in the memories of the residents of Kinmen (a group of islands governed by the ROC located off the coast of the People’s Republic of China). I once asked an elderly Kinmen resident, “It’s strange: when the other side’s announcer made broadcasts, were they speaking in the “National Language” (Mandarin)? Was it in the Beijing dialect? Did you understand them?” The local had nodded and replied, “At first, I didn’t understand, but after hearing them many times, I basically knew what they were trying to do, and I was scared.” Here, the efficacy of the sound seems not to be on its language content or the audio material’s references, but rather on the specific atmosphere produced by the sound and how this becomes embedded in the temporal and spatial context of its reception, eliciting in people both fear as well as hope for better days.
Simulation and Synchronization
In fact, there are many scenarios when the clarity of language and communication does not depend on the accuracy of vocalization, for instance, in the Alpine yodel, or a whistle: these are ambiguous linguistic states, inexplicit sound simulations. With these sounds, one is able to gauge their spatial surroundings, indicate his or her own location, call in other people to gather, or command them to disperse. By means of pitch and tempo, such imprecise utterances often conceal further codes of communication and value signals for its communities.
The mimicry of voice as well as the imitation of speech is common in singing, especially in vocal music. As a matter of fact, in a chat with a vocalist, I learnt that many vocalists do not speak Italian, German, French, nor the other common languages of opera — that is, when they sing La Bohème, The Magic Flute, or La Traviata, they do not really understand the literal meaning of the words that they are vocalizing in that moment.
This is an intriguing situation: you have to express a grammatical form of intellectual connotations using a series of fixed rhythms and vocal tones; at the same time, with these parts that don’t have any “comprehensible meaning” for you, you have to describe and act out a particular emotional state, carry out “dialogue”, drive the plot development, and resonate with the audience. To be truthful, it is often the case in opera, that in order to comply with the musical composition, the natural syllabic rhythm of the language must be ignored and overwritten, abstracted into overlaid sounds and onomatopoeia. Regardless of whether one understands the opera’s original language or not, its words are disassembled into blocks of sound: its meaning is stripped away, leaving only an imitation of the voice.
It is perhaps out of this peculiarity that the “meow opera” by Gioachino Rossini, Duetto Buffo Di Due Gatti (Cat Duet), came into being: for the entire song, the composer only gives the two sopranos the word “meow”, which they have to sing in tune and rhythm to simulate a dialogue in “cat language”, elaborating a humorous scene of cat grooming. Although the audience also do not speak “cat language”, judging by their warm reception during its performance, they certainly do “get it.” As the “lyrics” repeat, the song gets stuck in the listener’s head. Funnily enough, nearly two centuries later, a virtually unknown singer and songwriter release the Chinese pop song “Learning to Meow,” which, with its catchy beat and the assistance of algorithms, comes to dominate a new internet generation in 2010.
Potent Notation and Virtual Spaces
If the persistent simulation of sound and context is a strategy of standardization, formalization, and domination, then disruption and discordance are perhaps its counterpart. Writing in Mille plateaux (A Thousand Plateaus) at the turn of the last century, Deleuze and Félix Guattari propose “stammering” as a potential technique of deterritorialization: “That is the same as stammering, making language stammer rather than stammering in speech. To be a foreigner, but in one’s own tongue, not only when speaking a language other than one’s own. To be bilingual, multilingual, but in one and the same language, without even a dialect or patois … That is when style becomes a language. That is when language becomes intensive, a pure continuum of values and intensities.” Through stammering, the restrictive space is shattered, unraveling the potential of the immanent, or “virtual.”
On the other hand, our hyper-connected, globally mobile society is also an ever-sharpening, double-edged sword, where mass media and algorithms serve as the means to control subjectivities through datafication and homogenization. Under the twin pillars of contemporary geopolitics and global economics, those in power have always had the thirst to convert all the dynamic cacophony of the world into notation, combined into musical scores and transformed into a unified language system. This cacophony ought to be incongruous texts from completely different cultures and geographies, having their own emotional resonances that I ought not necessarily be able to relate to. Yet, through repeated transmissions, they are drilled and replayed to me in all forms and ways, until they become familiar and affective.
In derivative videos on Youtube, films and cultural documentaries in different languages are simplified into similar storylines: the diverse names and personalities of protagonists seem to be fused into a generalized identity who is merely playing out a programmed script. When I open a musical score, heterogeneous sounds outside of my direct experience have been reduced to marks signaling the beat, changes in speed, and emphasis on the musical staff. Retranslating the data to generate music, the senses of my body become the simulated movements of a robot: obeying the system of this linguistic syntax, it lifts its hands to play, takes a breath, then plays again, in a calculation of the potential resonance of the body’s movements.
Can we escape this paradigm of networked uniformity through a new logic of fragmentation? Can individuality and interiority (the virtual) still exist? Fortunately, sound does not only arouse an emotional register roped to collective memory, but also elicits multiple and vastly different, unique emotions. When I hear the ROC’s national anthem, I remember always being late to my senior high school’s flag-raising ceremony; on the other hand, someone else may fondly recall that time on a movie date when they secretly held their date’s hand before the film began. And still for others, it may evoke the deeply distressing memory of their family’s experience during the martial law period.
In our society of hyper-connection and global mobility that has developed over the last half-century, the emotions associated with sound have become even more complex and have no simple interpretation. The boundaries demarcating “sound communities” and “sonic territories“ are no longer tied to geographic constraints, but are distributed, multidimensional pathways. The kind of emotional memories that are not wholly determined by geopolitical region, race or ethnicity, and that embody the spatially disconnected notions of “impurity” and “heterogeneity”, appear to make decentralization and the multiplicity of voices possible.