Adam Brandt, Spencer Hazel and Kleopatra Sideridou
Newcastle University, UK
The emergence and ongoing proliferation of conversational technologies powered by Artificial Intelligence (AI) presents some interesting theoretical and methodological challenges for Conversation Analysts. Previously taken for granted concepts such as ‘intersubjectivity’, ‘sociality’ and even ‘interaction’ itself, are challenged when we consider how they might apply to engagement with machines, which are designed to emulate social interaction; for example a visitor to a museum using a robot guide, a person talking to their smart home device, or a bank customer having a telephone exchange with an automated voice assistant.
Naturally, such challenges do not deter our community, and there is an emerging body of Conversation Analysis (CA) research on conversational technologies such as voice assistants like Alexa (e.g. Albert et al., 2023) and Google Home (e.g. Due & Luchow, forthcoming), call centre artificial agents (e.g. Avgustis et al., 2021; Korbut, 2023), and social robots (e.g. Majlesi et al., 2023; Pelikan, 2023; Pelikan & Hofstetter, 2023; Tuncer et al., 2023). Many of the researchers working in this space have formed the EMCAI network, the aim of which is to foster dialogue, and identify areas for collaboration, at the intersection of EMCA and AI.
At the same time, the development of such technologies presents a new avenue for the application of CA (cf Antaki, 2011; Stokoe, 2014). It seems that parts of the tech world are beginning to become aware of the merits of CA research; for example, the launch page of ChatGPT-4o makes reference to a seminal CA study on turn-taking and timing (Stivers et al., 2009). Conversation Analysts Elizabeth Stokoe and Saul Albert are working with Cathy Pearl (User Experience Lead at Google) on how to use insights from CA in the design of conversational AI systems. A forthcoming special issue of Discourse & Communication will also collect CA studies examining a range of conversational technologies, coupled with commentaries on those studies by experts from industry (Stokoe et al., 2024).
With our unique insight into human sociality, it is no surprise that many CA researchers are turning their attention to what has been termed ‘artificial sociality’ (Natale & Depounti, 2024), and that we have much to contribute. It is clear to us that CA researchers can feed directly into the design and product development of various types of these technologies, across a range of specific settings.
In this piece, we report on our work doing just that. Working in collaboration with industry, we have been using CA to support the development of an AI-powered voice assistant used for routine clinical telephone conversations. We explain our process and provide examples of some insights. Given the rapidly-changing technological landscape, we also suggest some potential future ways that CA might be applied in this area.
The collaboration: CA for Conversation Design
Since spring 2022, we have been working with Ufonia, a HealthTech startup who have developed Dora, an AI system used for automated telephone consultations with patients. Dora is currently being used for several clinical care pathways across various areas of the National Health Service in the UK, including for example check-in calls for patients on a waiting list for surgery, and follow-up calls for patients who have had cataract surgery. Working in partnerships with Ufonia’s AI Software and Product Engineers, we have been feeding into the conversation design of Dora. That is, assisting the design team to understand how particular sequences in Dora’s call are organised, and how particular Dora turns at talk might be formatted. Our agreed approach was: to analyse calls between Dora and patients (as well as between Dora and test users) to identify areas in which the conversation could be improved; to use these analyses to propose, and implement, changes to the Dora system; to further analyse the revised Dora so as to establish the extent to which the changes were successful.
The team initially came to us seeking insight into the management of conversational repair, and how this might be improved through redesigning Dora’s initiations of repair, or the system’s responses to patient repair-initiations. We began by exploring this topic, but our attention was soon redirected towards the sequential organisation of Dora’s call openings. As has been noted before (e.g. Stokoe, 2014), institutional call openings are essential points at which service providers may succeed or fail (for example, in this case, at getting user engagement). As we considered the various components of Dora’s call openings (including greetings, self-identification, disclosure of the purpose of the call), it became apparent to us that it was not designed in a way that reflected what is done by human clinicians performing these actions.
At the time (and up until very recently), as with all such conversational technologies, Dora’s talk was not produced by a generative AI model, but rather by a human conversation designer, who had produced a written script for Dora’s conversational turns. As we explored the conversation design industry, we began to realise that this design process does not involve the close examination of equivalent human social conduct in that particular activity. This leaves conversation designers necessarily relying on their own common-sense understandings of the mechanisms underpinning natural conversation, while aspiring to produce systems that are ‘natural and intuitive’, as Google states. While generative AI tools like ChatGPT have quickly become associated with the concept of ‘hallucinating’ (that is, referring to patterns or observations that are in fact nonexistent) we came to realise here the conversation design process is also susceptible to human hallucinations – conversation designers assuming that certain social actions are performed in certain ways, and imagining features of these that are not present, or omitting in their designs features that actually are present. This seemed to us to be a major shortcoming, and our initial process was therefore adapted to incorporate an additional step in which we performed a direct comparison between Dora’s conversational practices and those of human clinicians in equivalent sequences.
An example of a CA-informed intervention
We cover some of our work on call openings in our contribution to the upcoming Discourse & Communication special issue (Brandt et al., 2024). Here we will provide an alternative example from our process. This example is illustrative of one other observation we made about potential problems in the design process: when the written scripts are run through the system’s text-to-speech (TTS) synthesiser, it inadvertently introduces elements into the talk that are unforeseen and at times unwelcome. For example, commas are converted to intra-turn pauses, full stops to longer pauses with falling intonation on the preceding lexical item, and question marks to pauses with rising intonation on the preceding lexical item.
At times, this can have unintended implications for how and when users respond to Dora. For example, below is the script for one of the questions asked to a patient during a post-cataract surgery consultation:
Extract 1: Script for Dora cataract question
Any Conversation Analyst will observe that this is not naturalistic formatting of an information-eliciting question. This becomes more evident when we see how the TTS synthesiser converts this to a Dora speech output, represented in a Jeffersonian transcript:
Extract 2: Cataract question script converted through TTS
You may note how the grammatical markers used in the script inadvertently introduce features such as marked intra-turn pauses and intonation contouring into the Dora automated speech output. We can further observe potential problems with this when we instead present the transcript with Transition Relevant Places (TRPs) in mind:
Extract 3: Cataract question with TRPs
When transcribed this way, it becomes clear how there are points (lines 03 and 06) at which a user may project that a response is required, and may then produce a response which the system has not been designed to recognise and is not able to process. Indeed, that is what we found in some calls:
Extract 4: Cataract question with patient response
In Extract 4, we can see that that patient attempts to produce a response at one of these TRPs, following completion of Dora’s question at line 05. The production of this response (line 08) is produced in overlap with Dora’s turn increment (line 07) and so is not detected by the system (line 10). As a result, the patient repeats their response (line 11) and there follows some significant delay (line 12) before Dora produces a receipt token (line 13). This is so delayed that the patient produces a third attempted response at line 14, this time in overlap with the beginning of Dora’s next question (lines 15).
Based on observations from sequences like these, we proposed reformatting this question so that the TTS did not introduce a TRP which was potentially problematic for the system:
Extract 5: Revised script for Dora cataract question
This revised script was reverse-engineered, with an awareness of the TTS process, and with our final Dora ‘talk’ output in mind. This revised script reduced the likelihood of misalignment between the patient and Dora, for this question, as can be seen in this final Extract:
Extract 6: Revised cataract question with patient response
We proposed then approaching this aspect of the design process as ‘text-to-talk’: to avoid writing in dramaturgical-style scripts, but to consider this process as a coding of talk. We have also discussed this in relation to call openings elsewhere (Brandt et al., 2023).
With insights such as these, drawing on knowledge of how people design such social actions, and also with awareness of how to manipulate the TTS process to produce more human-like talk, we developed a workflow for designers of conversational AI systems: CADENCE (Conversation Analytic Design for Enhanced Natural Conversation Experience; Hazel & Brandt, 2023). We have demonstrated this workflow in training workshops for conversation designers, delivered online and in-person at Google in London.
Perhaps predictably, technological advancement means that some of our findings already need to be adapted to a new conversation design process. As of the end of 2023, Large Language Models (LLMs), such as ChatGPT, are now being incorporated into conversational AI systems. This means that rather than humans writing scripts for these systems, they are now designing prompts for the LLMs to act upon. We are now in the process of translating our analytic observations into something relevant to the new world of LLM-powered conversational agents. This is a huge, rapidly-developing, priority in industry and in academic research, and Conversation Analysts still have much to contribute in this space. In our closing section, we highlight some areas that we are beginning to explore next, recognising that there are many other potential future avenues for CA-based research.
Possible future avenues for CA for artificial sociality
Despite the (on the surface) impressive recent demos of ChatGPT-4o, the question remains: can we teach ChatGPT to talk? When the role of humans in the conversation design process is changed, or even removed entirely, does this lead to a more naturalistic conversational AI system? The approach we explained above can also be applied to address this question, both in the healthcare context that we are working in, and any others in which human users engage with conversational AI. In our next project, we will therefore be drawing on a larger corpus of human clinician telephone calls to see what conversational practices might be missing from generative AI conversational systems, and how they might be effectively introduced.
We are also interested in the implications of conversational technologies for second language (L2) speakers and have begun to explore this area too (Hazel & Brandt, 2024). Previous research has suggested that when things go wrong with conversational AI, first language users will tend to blame the technology or the design, whereas L2 users will blame their own inadequacies in that language (Wu et al., 2020). In other words, current conversational AI systems may be giving L2 speakers a poorer user experience, while also enhancing their negative self-perception as L2 speakers. On the other hand, the hypothetical development of multilingually or interculturality adaptive conversational technologies may have huge positive implications for users, including enhanced accessibility (Brandt & Hazel, in press). What conversational AI means for how L2s are practised and assessed is also an area for exploration.
There are also questions around the limits of what conversational AI can, and should, do. In our view, the conversational AI industry appears in many cases to have prioritised developing products with personas (a dramaturgical construct aligned with a ‘character’ in fiction), often aligned with an associated brand. This work appears to be emphasised over developing a system which closely approximates the specific conversational patterns with which users will be accustomed, and find easy to use. This trend appears to be continuing, with aspirations of developing AI with emotional intelligence increasingly evident.
We have argued elsewhere that conversational AI technologies should go in the opposite direction, and be designed simply as tools to be used, foregoing any claims of sentience, cognition or emotion (Brandt & Hazel, 2023). This, we feel, is less of a risk, not only ethically, but also in relation to the experience of users (who may otherwise be led to assume the system is more capable than it is). However, this appears to be a direction of travel within the tech industry that a few humble Conversation Analysts cannot halt. Instead, our field can contribute positively to this endeavour, feeding into conversation designers how interactional markers of, for example, empathy or rapport can be, where appropriate, embedded naturalistically and ethically, into conversational systems.
Aside from the potential for applications of CA to the conversational technologies industry, it feels pertinent to close by acknowledging what will always remain a priority for Conversation Analysts: understanding human sociality. As it becomes more commonplace for people to interact with conversational technologies where previously they had interacted with another person, and as we increasingly recognise the broader relationship between changing technologies and changing social practices (e.g. Mlynář & Arminen, 2023), we will have much to consider as we explore the future of human sociality in the conversational AI landscape.
Acknowledgements
The project and our related presentations and publications were supported by a British Academy Innovation Fellowship for Adam Brandt (IF2223/230141).
Works cited
Albert, S., Hamann, M., & Stokoe, E. (2023, 2023-07-19). Conversational User Interfaces in Smart Homecare Interactions: A Conversation Analytic Case Study. Proceedings of the 5th International Conference on Conversational User Interfaces.
Antaki, C. (Ed.). (2011). Applied conversation analysis: Intervention and change in institutional talk. Springer.
Avgustis, I., Shirokov, A., & Iivari, N. (2021). “Please Connect Me to a Specialist”: Scrutinising ‘Recipient Design’ in Interaction with an Artificial Conversational Agent. In C. Ardito, R. Lanzilotti, A. Malizia, H. Petrie, A. Piccinno, G. Desolda, & K. Inkpen (Eds.), Human-Computer Interaction – INTERACT 2021 (pp. 155-176). Springer International Publishing. https://doi.org/10.1007/978-3-030-85610-6_10
Brandt, A. & Hazel, S. (in press). Towards interculturally adaptive conversational AI. In: Zhu Hua & David Wei Dai (Eds.), AI for Intercultural Communication. Special Issue of Applied Linguistics Forum. To be published spring/summer 2024.
Brandt, A. & Hazel, S. (2023). Ghosting the shell: exorcising the AI persona from voice user interfaces. Conference: Artificial intelligence, health care, and ethics. Newcastle University.
Brandt, A., Hazel, S., McKinnon, R., Sideridou, K., Tindale, J., & Ventoura, N. (2024). Educating Dora: Teaching a conversational agent to talk. Discourse and Communication 18(6): DOI tbc.
Brandt, A., Hazel, S., Mckinnon, R., Sideridou, K., Tindale, J., & Ventoura, N. (2023, 2023-07-19). From Writing Dialogue to Designing Conversation: Considering the potential of Conversation Analysis for Voice User Interfaces. Proceedings of the 5th International Conference on Conversational User Interfaces.
Due, B. L., & Lüchow, L. (forthcoming). VUI-speak: There is nothing conversational about ‘conversational user interfaces”. In F. Muhle & I. Bock (Eds.), Social Robots In Institutional Interaction. Bielefeld University Press.
Hazel, S. & Brandt, A. (2024). A World of Automated Talk – L2 Speakers and the Challenge of Conversational AI. Proceedings of the 7th CAN-Asia Symposium on L2 Interaction. https://tim792.wixsite.com/can-asia/proceedings-2024
Hazel, S., & Brandt, A. (2023). Enhancing the Natural Conversation Experience Through Conversation Analysis – A Design Method. In HCI International 2023 – Late Breaking Papers (pp. 83-100). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-48038-6_6
Korbut, A. (2023). How Conversational are “Conversational Agents”? Evidence from the Study of Users’ Interaction with a Service Telephone Chatbot. Social Interaction. Video-Based Studies of Human Sociality, 6(1). https://doi.org/10.7146/si.v6i1.137249
Majlesi, A. R., Cumbal, R., Engwall, O., Gillet, S., Kunitz, S., Lymer, G., Norrby, C., & Tuncer, S. (2023). Managing Turn-Taking in Human-Robot Interactions: The Case of Projections and Overlaps, and the Anticipation of Turn Design by Human Participants. Social Interaction. Video-Based Studies of Human Sociality, 6(1). https://doi.org/10.7146/si.v6i1.137380
Mlynář, J., & Arminen, I. (2023). Respecifying social change: the obsolescence of practices and the transience of technology. Frontiers in Sociology, 8. https://doi.org/10.3389/fsoc.2023.1222734
Natale, S., & Depounti, I. (2024). Artificial sociality. Human-Machine Communication, 7, 83-98. https://doi.org/10.30658/hmc.7.5
Pelikan, H. (2023). Robot sound in interaction: Analyzing and designing sound for human-robot coordination. Linköping studies in Arts and Sciences 853. Linköping University. https://doi.org/10.3384/9789180751179
Pelikan, H., & Hofstetter, E. (2022). Managing Delays in Human-Robot Interaction. ACM Transactions on Computer-Human Interaction. https://doi.org/10.1145/3569890
Stokoe, E. (2014). The Conversation Analytic Role-play Method (CARM): A Method for Training Communication Skills as an Alternative to Simulated Role-play. Research on Language and Social Interaction, 47(3), 255-265. https://doi.org/10.1080/08351813.2014.925663
Stokoe, E., Albert, S., Buschmeier, H., & Stommel, W. (2024). Conversation analysis and conversational technologies: Finding the common ground between academia and industry. Discourse and Communication 18(6): DOI tbc
Stokoe, E., Albert, S., Parslow, S. & Pearl, C. (2021). Conversation design and conversation analysis: Where the moonshots are. https://elizabeth-stokoe.medium.com/conversation-design-and-conversation-analysis-c2a2836cb042
Tuncer, S., Licoppe, C., Luff, P., & Heath, C. (2023). Recipient design in human–robot interaction: the emergent assessment of a robot’s competence. AI & SOCIETY. https://doi.org/10.1007/s00146-022-01608-7
Wu, Y., Rough, D., Bleakley, A., Edwards, J., Cooney, O., Doyle, P. R., Clark, L., & Cowan, B. R. (2020, 2020-10-05). See What I’m Saying? Comparing Intelligent Personal Assistant Use for Native and Non-Native Language Speakers. 22nd International Conference on Human-Computer Interaction with Mobile Devices and Services.