First of all, some comments must be made about our representation of the context of a dialogue: our Context Models (indicated as CTXi in the figures) are rather different from the dialogue model described in [Lambert & Carberry1991], [Carberry et al. 1992] and [Lambert1993]. As discussed in [Ardissono et al. 1996], although both frameworks contain domain, communicative and metalevel plans, we make a different use of our (AM) metalevel and communicative plans. In Lambert's model, the problem solving plans relate communicative actions to the domain level plans pursued by the speakers; instead, in our approach, both domain level and linguistic actions are the objects of the metalevel actions and can be performed interchangeably. Moreover, our structures can be extended for answer production, adding them some supplementary information about how to execute actions Ardissono-Cohen:96a.
As far as the management of dialogue is concerned, other frameworks have
exploited the view of dialogue as a collaborative activity among agents.
For example,
[Chu-Carroll & Carberry1995a], [Chu-Carroll & Carberry1995b]
describe a framework where a system and its
user collaborate to build a (correct) plan for obtaining the user's goals:
in an interaction, the system and the user negotiate their beliefs, until
they agree on a solution. In that work, however, no negotiation of the
meaning of sentences is performed.
Dialogue has been modeled as a collaborative activity also in
[Heeman & Hirst1995], where a planning framework has been used to interpret
and recognize referential expressions. In this model, the
conversational turns are seen as actions that contribute to the interaction
by establishing some communicative goals, which become shared between the
interactants. Although our work is based on the same idea, it differs in the
way how it models negotiation among agents.
In particular, Heeman and Hirst introduce specific actions to plan the form of
a referential expression, and a number of actions which describe the reactions
which an agent can adopt when he receives a referential expression: e.g. the
agent can accept it, refuse it, correct it, and so on, depending on the fact
that he has been able to resolve the expression, or he has encountered any
troubles in the interpretation process. In any case, he starts collaborating
with his partner at the resolution of the problem, by means of a negotiation
subdialogue.
In our work, the interpretation of referential expressions is not separated with
respect to the other linguistic phenomena: we describe a generic framework of
interaction, where the various interpretation levels (syntactic, semantic, etc.)
are modeled in a unique way, by means of the execution of object level
interpretation actions, under the control of the Agent Modeling actions which
manage the reaction to a possible failure.
The idea is that an acceptance, a refusal, or other, may happen at different
levels, not only at the referential one: for example, an ambiguous word
may cause the hearer to build more than one semantic interpretation of a
sentence, so that he would have to start a clarification subdialogue with the
speaker
to establish the intended meaning of the utterance; in fact, this is quite
similar to the case of an underspecified referential expression. For this
reason, we adopt a unique (metalevel) model for handling action
execution and failure: this model rules the execution of domain actions, as
well as speech acts and interpretation actions. During the interpretation of
an utterance, it supports the possible reactions to a failure (in terms of
notification to the partner) at the utterance level, as well as at the
locutionary and illocutionary ones.
Our representation of the utterance interpretation process as multiple object level actions does not only offer a framework for treating interpretation failures, but also for managing the phenomenon of grounding in dialogue Traum-Hinkelman:92: we treat the acknowledgments that the receiver of an utterance has interpreted it successfully and accepted it (e.g. ``Okay.'') exactly in the same way as the acknowledgements of successful execution of a domain action, as prescribed by our Agent Modeling plans;
similarly, the occurrence of requests for an acknowledgement by the speaker of an utterance (e.g. ``Right?'', ``Ok?'') are collapsed into the process by which an agent monitors that his partner has executed a requested action successfully:
in the case of linguistic actions, this corresponds to checking whether the partner has been able to perform the interpretation of the utterance and the updating of his own beliefs with the communicative effects of the utterance itself.
Some computational models of dialogue use a notion of coherence based on an
analysis of the expected behavior of agents in conversation. In those
approaches (for instance,
consider [McRoy & Hirst1995], [Traum & Hinkelman1992],
[Traum & Allen1994],
[Danieli1996]), the speech acts occurring
in the last conversational turn, together with the existing dialogue context,
are used to predict which
speech acts the interlocutor should perform if the interaction goes well; a
deviance from the expected behavior is taken as a sign that some interaction
problem is occurring and the presence of a misunderstanding is hypothesized.
Our work uses a deeper notion of context, where different types of intentions
(related with the interaction, as well as with the agent's domain activity and
the goals deriving from his participating to a conversation) are represented
explicitly and maintained as a context; they contribute to the identification
of more general relations that can exist with the new input. In particular, we
model linguistic expectations and underlying intentions uniformly as goals,
although they fall at different levels.
So, we provide a unified model of the dialogue context,
where the pending goals which happen to correspond to the linguistic
expectations fall at a lower level with respect to those pertaining the
underlying activity of the agents. An immediate consequence of this is that,
when a turn satisfies some low-level pending goals, there is
no need to inspect the higher level ones (so, the effort spent in the
interpretation task is limited).
At the same time, however, when the low-level goals are not matched, the
dialogue context is rich enough to be analyzed, searching for the possible
relations between the turn and the previous part of the interaction.
Our model also differs from the above mentioned systems because of the two-level organization of our plan libraries, where the agent modeling plans rule all other actions, including interpretation and repair actions. In particular, in [McRoy & Hirst1995], metalevel plans model the expected continuations of an interaction: following the ideas developed in [Litman & Allen1987], the strongest expectation is that the receiver of a turn accepts the turn and reacts by contributing to it, but he might also start another (sub)dialogue, by introducing a new communicative goal. McRoy and Hirst extend Litman's approach and introduce metaplans to diagnose misunderstandings and formulate repairs, when the expected behavior is violated. However, their metaplans only analyze the surface expectations introduced by performing of a speech act; the absence of a deeper intentional analysis limits their approach to the treatment of misunderstandings on speech acts, while our model also treats misunderstandings on domain level actions (see the analysis of the example in section 4).