Robot opera: Robert Gyorgyi interviews Ron Chrisley

Robert Gyorgyi, a Music student here at Sussex, recently interviewed me for his dissertation on robot opera. He asked me about my recent collaborations, in which I programmed Nao robots to perform in operas composed for them. Below is the transcript.

Interview with Dr Ron Chrisley, 20 April 2018, 12:00, University of Sussex

Bold text: Interviewer (Robert Gyorgyi), [R]: Dr Ron Chrisley

NB: The names ‘Ed’ and ‘Evelyn’ often come up within the interview. ‘Ed’ refers to Ed Hughes, the composer of Opposite of Familiarity (2017) and Evelyn to ‘Evelyn Ficarra’, composer of O, One (2017)

How did you hear about the project? Was it a sort of group brainstorming or was the idea proposed to you?

[R] -Evelyn approached me, then we had a meeting when she explained her vision to me.

These NAO robots are social robots designed to speak, not to sing. Was the assignment of their new task your main challenge? How did you do that?

[R] -Surprisingly, you might actually think that they were designed to sing too, in a sense. If you [use online browsing software to find some information regarding] it, you can find videos supposedly featuring singing NAO robots. However, as far as I can tell, they are all fake. That is, these robots can produce speech, you just have to type in a command word and NAO can pronounce it reasonably well. They can also play arbitrary digital files, like WAV-files. You can make the robot make gestures while it’s playing back a recording of, say, a human trying to sound like a NAO robot. There are also some videos out there in which, somehow, they got people who can sound like a NAO robots (or just had their voices processed), so the person is singing, and the NAO is just playing back a recording of the person. That’s the closest thing to robot-singing I found. I did not want to do that. I have even given lectures about ‘what would it mean for a robot to sing’? Is it just playing back an audio file of a human singing? Most of us would not call that singing. So what would it have to do? I was trying to figure out some way to get the robots to make sounds in a way that we would call ‘singing’. It is a bit tricky to put your finger on it, it takes a little bit of time to figure out, because there are all kinds of sounds a robot can make…but why call it ‘singing’, per se? We wanted singing because that is a core part of opera, so I wanted to figure out whether a robot could do that intuitively, but not by slavishly copying human singers just by playing back a recording of them. So that was a difficult balance to find. Nevertheless, we quickly settled on the idea that we would use the speech production capabilities of the robot. Not all singing has to have words, right? You can sing without words, but there seems to be some connection between what we call singing and speaking: singing should come from the same process that we use to speak. If I go like this [clapping], it is not singing, even though I am making sounds with my body. It is something about using the voice that is used for speaking to make music, and that’s what I was trying to do: taking the speaking abilities of the robots and trying to make them sing. There are two crucial dimensions: pitch and rhythm. If we get a pitch that is not determined by the content of the lyrics, maybe it is not even meaningful. So somehow, I had to have a selection of arbitrary choices about what the pitch would be, according to the librettos. I also had to control the rhythm. So that’s the main variables that manipulated pitch and rhythm, all starting from the ‘natural’ speech of the robots. Also, the robot designers do realise that some might want to override the programming they have done. They have spent a lot of time trying to make the intonation and rhythm natural [of the NAO robots]. For instance, all you have to do is program in ‘how are you feeling today’ and it will figure out itself how the sentence should sound human-like instead of a random relations between the sounds [demonstrating]. So even though the robots have learnt the natural pitch and rhythm of human speech, the designers offer some possibilities to alter them. If you want to add an emphasis or change the pitch or the speed, then, within some limited ways, you can modify them. It is not easy to do, and they usually operate on a full-sentence level, that is, you can shift up in pitch the whole sentence [demonstrating]. Regarding rhythm, because it is not a MIDI language with a complete rhythmic specification, you can either make them speak this fast, or this slow. So that is pretty limited, but if you want to get them to sing, you have to take those few abilities that the designers have given the end user, and somehow apply them at levels they were not meant to be applied. Like on an individual syllable level: just have this syllable spoken for very high and slow, but the next one very low and very fast.

So, essentially, you could only work with approximate pitches?

[R] -What I ended up doing was writing routines that approximated the pitches desired by the composers. If they said to me, ‘this section is in 120 BPM’, then I would be able to put that in to change the low level code and the variables to get the output that would roughly be 120 BPM.

Still, the ‘achieved’ pitches are pretty accurate in both pieces. Judging by the way you describe this process, it must have been a very complicated and meticulous job, with a very impressive upshot.

[R] -Thank you, but there are a lot of dark arts there [laughing], in so many different ways. One is that it is not a systematic process at all: the kinds of tempos and pitches I got depended on the material I was asking the robot to sing. So if I asked it to sing the syllable ‘la’, it might have just given the right intervals in the right tempo and on the right rhythm. If I gave it a different syllable, like ‘ta’ maybe, it would be different pitches. In quotation marks, if you say, ‘say the syllables “la- la-la“’, the robot might produce a different pitch and tempo than if you just said ‘sing “la” for a long amount of time’. It is very complex. Also, one thing that confuses it, or made it possible for us to succeed, is that Ed has a history of composing to performers: working with a set of performers for several years, composing pieces for them, knowing what they can do, what their limits are. And so he took the same approach with the robots. For example, Ed didn’t just say ‘this is my vision Ron, you figure it out how to make the robots do it’, it was more like ‘let me listen what these robots can do, let me listen to the aspects’. I generated, almost randomly, points that had the robots cycle through different combinations of syllables and rhythms, pitches and patterns like arpeggios and non-arpeggios, then Ed listened to them and found the ones that were of the most musical interest to him, and then wrote around those. He basically said ‘okay, that arpeggio sounds pretty good so I’m going to put some arpeggios in my piece because we know that we can get some good output from that’. Altogether, it was an interactive and not one-directional process.

You worked with Python, is that correct?

[R] -Yeah, I was using that. It is a programming language that is very widely used. I chose it because it was the easiest documented way for me to manipulate the variables that I had available to me to make different sounds, so in theory, it probably could have been done in other languages, but I just found it to be easy. It is a very simple and straightforward language.

You’re also a musician. There was singing, actors, lights, a stage, a set, and everything, so it is, as Evelyn called it, a ‘mini opera’ [see Appendix 6], but maybe the general public would be wary of this term. Would you personally consider it an opera?

[R] -I do think that we succeeded in producing something that could count as the genre of opera, and particularly maybe in the new sub-genre, robot opera.

We have several robot operas, even though they use recorded voices, such as Tod Machover’s piece. Were you inspired by any of them?

[R] -Here’s what is different about what we did. I do think, as far as I can tell, that this might be the only case of a robot opera that meets these following conditions: first of all, it isn’t a robot opera in the sense that it’s about robots. There were some robot operas in which humans have dressed up as robots [i.e. Frederic Hard’s The Romance of Robot (1937), Chapter 1]. Then there have been some others with robots on stage but mainly as part of the set and only humans do the singing. We are getting close to what we did. I think, there have been robot operas where there are both robots and humans doing operatic scenes, but then again, robots making sounds you would not necessarily call ‘singing’. I have not heard any robot opera, other than ours, that did have exclusively robot singers, and the robots are humanoid robots, being the identifiable actors of it, and in which the robots are autonomous not only remote control devices. Autonomy is kind of a combination between being interactive and responding to human commands. Even remote-controlled cars are responsive to human commands, but that’s all they do. There are other robots, like the ones in assembly lines which do a very complex sequence of tasks but they are completely not responsive to human commands, except when the humans push the button in the beginning and the end of their tasks. What is interesting about autonomous robots is that they live somewhere ‘between’, in a modest way. Yes, there is a long sequence of behaviour that the robots are programmed to initiate, but the exact nature of the behaviour is not fixed by me, it is entirely depended on runtime conditions. It is something I would like to further explore. In the piece that we actually did perform, the robots would not proceed to the next part of the piece until they receive some type of signal.

Were those signals the cue cards Evelyn was using during the performance?

[R] -Yes. These things are called ‘NAO Marks’ and they were designed to give the robots visual processing abilities. These marks are very easily identifiable for the NAO. They can be rotated, it does not matter, and yet the NAO can identify them in a wide range of conditions. They are very useful for getting the robot to respond to its visual environment. Having them meant that even though the robots were ‘launched’ by us when the opera began, the following parts of the piece depended upon when they would notice the NAO Marks.

Is it like a QR Code?

[R] -Yes, very much like a QR Code. The opera that we did actually had phases. For example, a robot is searching for another robot, and then they are coming together and moving apart…we could have that relatively autonomous. But imagine, you could also have a robot programmed to sing some kind of sad, plaintive song when it doesn’t see the NAO Mark of another robot, so it is kind of lonely, it is missing the other robot. Then, when it does see that NAO Mark, which obviously means it is facing it, then it starts singing a happier song, and it tries to move towards ‘its love’. Then, we could also have another robot — having a drama here —, that doesn’t like to see the NAO mark of the first robot, and the story would then become a love triangle. It would search for the NAO Mark of the third robot, so it would be looking for the NAO robot it would want to sing with. It can go on and on, of course, it essentially comes down to creativity.

It almost sounds like the Marriage of Figaro [W.A.Mozart, K.492].

[R] -Well, in terms of plot, it could be just as operatic. But with this kind of set, we would not know the outcome until runtime, we would not know what kind of music would be produced exactly. All we could do is to programme the songs. Then of course, commands like ‘when you see this character, do this, but then I want you to move this way’ and that would trigger the others. You can probably guess what would happen, but you are not specifying down to the lowest level detail like ‘at this time this event will occur and then this time this event will occur’. You are just letting it emerge.

That is almost the same with live actors. Every performance is different so there is always some space for surprises.

[R] -If you think about it, the performances in traditional opera are, in some sense, more robotic than the performance in this kind of opera that we are talking about: there is creativity and openness for the robot to do whatever it ‘wants’ to do as nobody is telling it ‘you must see something exactly at this time’. In contrast, in traditional opera, the actors are trying to implement the will of the composer, precisely following the director’s wishes, exactly at the right time as they are told. That is very much like computers, just doing as programmed. Robot operas, at least in the sort I was imagining in, are more open-ended and the outcome is not determined by such external orders. Don’t get me wrong, there have been lots of experiments, like the avant-garde or modern music, where the will of the composer and the director are less determinative and the performance is more organic. So even if this is nothing new, in having robots doing that, we turned the tables and revealed the traditional performance to be more robotic than the robot performance, which, I personally think, is very interesting.

Were the instrumental parts notated before, during, or after the robots were programmed? [R] -As far as the details of the notation, they differed. Ed used traditional notation for the instruments and the robots, including the lyrics and some notes to guide me in setting the parameters for the robots. I was trying to keep to the traditional score when programming, but luckily, the human performers were responsive and not ‘slaves’ to the notation, so the timing was flexible in case it would turn out that the robot was doing something different, like coming in a little bit earlier. With Evelyn, it was different. She had her own notation system that she used both for the robots and the cellist, Alice Eldridge. I only briefly looked at the notation for cello once or twice, so I can’t really recall the details, but it was not traditional notation for sure. It was somehow a representation of ‘play this kind of drone for this length of time’ and it just contained the information that the cellist needed to get to produce the kind of sound Evelyn wanted. Many things were left to the cellist though. The notation was often quite abstract in its construction, like ‘play something low and percussive here’. So then, it would be up to Alice’s interpretation to decide whether it was going to be completely rhythmic or halting. Then, during the rehearsals, Evelyn could maybe give further directions orally like ‘play it like you did before the last time’. Altogether, the vast amount of information on the sheet, and the common understanding between the cellist and Evelyn would generate the final performance. For the piece itself, Evelyn gave me specifications like ‘here are the things I want the robot to say’, or ‘I want it to be broken up into these different sections.’ These instructions would give me some idea of relative tempo in order to get them there. She also had to give me some pitch information, so it was, and at the same time, wasn’t a traditional score.

You didn’t know what the outcome would be like until the performance itself, you say. How could the musicians practice then? Was each rehearsal different?

[R] -Not at all. We did rehearse. The kind of interactive scenario I was painting for you earlier was not what happened in this case. The only thing that really varied from performance to performance with the robots was timing, when they would come in and start their parts. They had, for example in Ed’s piece, three different parts, so when the robot would start the next section depended upon the particularities of that night…but it sounded roughly the same. And so, there were not any big surprises for the human performers on the night of the actual performance. They knew what they were probably going to hear from the robots, because the most of it was predetermined, without any randomness or interactivity within the sections.

End of interview.

	ronchrisley on Robot Opera Mini-Symposium…
	ronchrisley on Self-listening for music …
	ronchrisley on Revisionism about Qualia: Pros…
	ronchrisley on Revisionism about Qualia: Pros…
	Revisionism about Qu… on Functionalism, Revisionism, an…

PAICS

Philosophy of AI and Cognitive Science

Leave a comment Cancel reply

PAICS

Philosophy of AI and Cognitive Science

NB: The names ‘Ed’ and ‘Evelyn’ often come up within the interview. ‘Ed’ refers to Ed Hughes, the composer of Opposite of Familiarity (2017) and Evelyn to ‘Evelyn Ficarra’, composer of O, One (2017)

Share this:

Related

Leave a comment Cancel reply