Next: Conclusions Up: A plan-based model of Previous: Evidence from a

Related work

First of all, some comments must be made about our representation of the context of a dialogue: our Context Models (indicated as CTXi in the figures) are rather different from the dialogue model described in [Lambert & Carberry1991], [Carberry et al. 1992] and [Lambert1993]. As discussed in [Ardissono et al. 1996], although both frameworks contain domain, communicative and metalevel plans, we make a different use of our (AM) metalevel and communicative plans. In Lambert's model, the problem solving plans relate communicative actions to the domain level plans pursued by the speakers; instead, in our approach, both domain level and linguistic actions are the objects of the metalevel actions and can be performed interchangeably. Moreover, our structures can be extended for answer production, adding them some supplementary information about how to execute actions Ardissono-Cohen:96a.

As far as the management of dialogue is concerned, other frameworks have exploited the view of dialogue as a collaborative activity among agents. For example,
[Chu-Carroll & Carberry1995a], [Chu-Carroll & Carberry1995b] describe a framework where a system and its user collaborate to build a (correct) plan for obtaining the user's goals: in an interaction, the system and the user negotiate their beliefs, until they agree on a solution. In that work, however, no negotiation of the meaning of sentences is performed.

Dialogue has been modeled as a collaborative activity also in [Heeman & Hirst1995], where a planning framework has been used to interpret and recognize referential expressions. In this model, the conversational turns are seen as actions that contribute to the interaction by establishing some communicative goals, which become shared between the interactants. Although our work is based on the same idea, it differs in the way how it models negotiation among agents. In particular, Heeman and Hirst introduce specific actions to plan the form of a referential expression, and a number of actions which describe the reactions which an agent can adopt when he receives a referential expression: e.g. the agent can accept it, refuse it, correct it, and so on, depending on the fact that he has been able to resolve the expression, or he has encountered any troubles in the interpretation process. In any case, he starts collaborating with his partner at the resolution of the problem, by means of a negotiation subdialogue.
In our work, the interpretation of referential expressions is not separated with respect to the other linguistic phenomena: we describe a generic framework of interaction, where the various interpretation levels (syntactic, semantic, etc.) are modeled in a unique way, by means of the execution of object level interpretation actions, under the control of the Agent Modeling actions which manage the reaction to a possible failure. The idea is that an acceptance, a refusal, or other, may happen at different levels, not only at the referential one: for example, an ambiguous word may cause the hearer to build more than one semantic interpretation of a sentence, so that he would have to start a clarification subdialogue with the speaker to establish the intended meaning of the utterance; in fact, this is quite similar to the case of an underspecified referential expression. For this reason, we adopt a unique (metalevel) model for handling action execution and failure: this model rules the execution of domain actions, as well as speech acts and interpretation actions. During the interpretation of an utterance, it supports the possible reactions to a failure (in terms of notification to the partner) at the utterance level, as well as at the locutionary and illocutionary ones.

Our representation of the utterance interpretation process as multiple object level actions does not only offer a framework for treating interpretation failures, but also for managing the phenomenon of grounding in dialogue Traum-Hinkelman:92: we treat the acknowledgments that the receiver of an utterance has interpreted it successfully and accepted it (e.g. ``Okay.'') exactly in the same way as the acknowledgements of successful execution of a domain action, as prescribed by our Agent Modeling plans;

similarly, the occurrence of requests for an acknowledgement by the speaker of an utterance (e.g. ``Right?'', ``Ok?'') are collapsed into the process by which an agent monitors that his partner has executed a requested action successfully:

in the case of linguistic actions, this corresponds to checking whether the partner has been able to perform the interpretation of the utterance and the updating of his own beliefs with the communicative effects of the utterance itself.

Some computational models of dialogue use a notion of coherence based on an analysis of the expected behavior of agents in conversation. In those approaches (for instance,
consider [McRoy & Hirst1995], [Traum & Hinkelman1992], [Traum & Allen1994],
[Danieli1996]), the speech acts occurring in the last conversational turn, together with the existing dialogue context, are used to predict which speech acts the interlocutor should perform if the interaction goes well; a deviance from the expected behavior is taken as a sign that some interaction problem is occurring and the presence of a misunderstanding is hypothesized.
Our work uses a deeper notion of context, where different types of intentions (related with the interaction, as well as with the agent's domain activity and the goals deriving from his participating to a conversation) are represented explicitly and maintained as a context; they contribute to the identification of more general relations that can exist with the new input. In particular, we model linguistic expectations and underlying intentions uniformly as goals, although they fall at different levels. So, we provide a unified model of the dialogue context, where the pending goals which happen to correspond to the linguistic expectations fall at a lower level with respect to those pertaining the underlying activity of the agents. An immediate consequence of this is that, when a turn satisfies some low-level pending goals, there is no need to inspect the higher level ones (so, the effort spent in the interpretation task is limited). At the same time, however, when the low-level goals are not matched, the dialogue context is rich enough to be analyzed, searching for the possible relations between the turn and the previous part of the interaction.

Our model also differs from the above mentioned systems because of the two-level organization of our plan libraries, where the agent modeling plans rule all other actions, including interpretation and repair actions. In particular, in [McRoy & Hirst1995], metalevel plans model the expected continuations of an interaction: following the ideas developed in [Litman & Allen1987], the strongest expectation is that the receiver of a turn accepts the turn and reacts by contributing to it, but he might also start another (sub)dialogue, by introducing a new communicative goal. McRoy and Hirst extend Litman's approach and introduce metaplans to diagnose misunderstandings and formulate repairs, when the expected behavior is violated. However, their metaplans only analyze the surface expectations introduced by performing of a speech act; the absence of a deeper intentional analysis limits their approach to the treatment of misunderstandings on speech acts, while our model also treats misunderstandings on domain level actions (see the analysis of the example in section 4).

Next: Conclusions Up: A plan-based model of Previous: Evidence from a

Guido Boella Dottorando
Fri Aug 29 11:33:46 MET DST 1997