Various researchers in plan recognition have dealt with the problem of choosing the best hypotheses from a set of alternatives for explaining an agent's behavior. In general, they have introduced various coherence criteria for checking the validity of the selected hypotheses (for a classification of some of these criteria, see [Stickel1988]). Kautz adopted the simplicity criterion and tried to explain an agent's behavior with a minimal set of higher-level hypotheses that justify the observed actions [Kautz1990]. Ng and Mooney [Ng and Mooney1990] observed that the simplicity criterion does not capture in a satisfactory way the point that, normally, most steps in sequences of actions are motivated by higher-level goals that are strictly related. They defined a coherence criterion for preferring the hypotheses that link in the best way all the observed actions. So instead of explaining the actions separately, the whole sequence of observations is interpreted by reducing it to the same higher-level actions (the principle may be applied to the observation of actions, as well as to the interpretation of the sentences in a written text, or in a dialogue).
Coherence criteria have a basic role in the determination of the most promising explanations of an agent's behavior. However, they are general principles, and should be considered in conjunction with other principles that take into account the information about the specific knowledge and beliefs of the agents. For example, [Raskutti and Zukerman1991] rank the knowledge sources that can be used to make inferences on the speakers' statements according to their reliability, and use this information as one of the factors to identify the most likely hypothesis on the users' plan in a Bayesian framework. Their approach, based on a measure of how well the actions to be performed on the basis of an interpretation are defined, is limited to the assumption that in cooperative interactions speakers communicate enough information so that their plans may be understood by the listener. Hence, the better defined the plans in an interpretation, the more likely that is the interpretation intended by the speaker. While we think that Raskutti and Zukerman's disambiguation criterion is important in dialogue interpretation, our claim is that knowledge about the users of a system makes it possible to draw simple inferences to discard some interpretations, independently of the knowledge source from which they are derived. In particular, it is possible that some of the actions hypothesized during plan inference are less plausible than other ones, because they are associated with higher-level goals that the agents are not supposed to be concerned with. For example, in an Italian CS Department, first year students are not usually concerned with the task of preparing their thesis, or choosing their curriculum, since these problems are dealt with at the beginning of their third year. In some plan recognition systems, the ambiguities among the alternative active hypotheses are managed by evaluating their plausibility on the basis of the idea that, although the same action may play a role in different plans, some of them are a priori far more likely than others to represent the agent's intentions. So, default inferences may be made in order to maintain a restricted number of alternative hypotheses [Carberry1990a], [Charniak and Goldman1991]. For example, if somebody asks how late a certain supermarket is open, people imagine that s/he intends to purchase some groceries, although there are other higher-level actions, such as cashing a cheque, that might motivate her/his question. This choice is based on the fact that cashing cheques in supermarkets is possible, but not so frequent as buying groceries. Although, in this case, the frequency of actions is sufficient for choosing the first hypothesis instead of the second, we think that this information is not always enough for the disambiguation task. Since many kinds of actions are not universally preferred over others, to decide which explanation is preferable we also must consider the available information about the observed agent. For example, although usually most people don't cash cheques in supermarkets, probably the supermarket personnel are used to doing it very often.
In dialogue systems knowledge about the domain, general knowledge about users and knowledge collected during a dialogue are very important to build rich user models (UMs). Stereotypical information makes it possible to ascribe to users features common to certain categories of people (and this is very important in our framework, because we do not store in a permanent way the specific knowledge about them; so, in principle, all the information must be collected during the dialogue). Prototypical knowledge has been widely used for representing general information about classes of users (see [Rich1979], [Chin1989], [Kobsa1990], [Morik1989], [Brajnik and Tasso1994]). In particular, in Chin's double-stereotype approach [Chin1989] each domain concept is associated with its expected difficulty for the typical user of the system.
Kass, Finin and Chin propose various UM acquisition rules for inferring the users' beliefs from their utterances (e.g. through the recognition of presuppositions in their sentences; for the identification of presuppositions in wh-questions, see also [Kaplan1982]) and from their reactions to the answers of the system (e.g. if the system uses a certain word and they do not ask for clarification, it infers that they understand its meaning [Chin1989]). Moreover, Kass and Finin use domain knowledge for identifying some UM acquisition rules related to the structure of the domain. In [Kass and Finin1987], the hypotheses on the users' acquaintance with certain concepts of the domain are used for hypothesizing their knowledge about other similar or related concepts.
Although UM acquisition rules are very important for collecting information about users, all of them are limited to the acquisition of beliefs about users' beliefs and knowledge of the domain. For example, they make it possible to learn which concepts users are acquainted with, which facts they believe are true, which they are not aware of, etc. On the other hand, it is difficult to define rules for acquiring intentions, apart from the simple cases where users explicitly state them (e.g. ``I would like to talk to professor Smith''). In order to add to the UM this kind of information, it is necessary to infer new intentions from the system's hypotheses on the users' plans as well.
There has been debate about the relationship between the notions of UM and discourse model (DM), where the latter is concerned with information about dialogue. While prototypical and specific information about a user's knowledge and beliefs are in the UM, it is not so clear whether the intentions expressed during a dialogue should belong to the UM, or to a separate structure, the DM, that represents the structure of the dialogue and is useful for understanding the subsequent utterances and establishing the coherence of the discourse. According to Chin [Chin1988] and Kobsa [Kobsa1988], the DM intersects the UM: the two structures have a common part concerning, among others, the entities which a discourse is about. Prototypical information about users is contained in the UM, while the linguistic goals expressed during dialogues fall into the DM, because they are pertinent to the structure of discourse itself. On the contrary, in [Wahlster1988] it is claimed that the UM and the DM are separate, the DM containing the whole information concerned with dialogue and the UM representing users' knowledge and beliefs. However, the UM is updated with new information coming from what is actually said as a dialogue goes on.
In classifying the contents of DM and UM, the above authors focus mainly on the communicative intentions of a user. However, in information-seeking dialogues, users' underlying domain plans and goals also must be considered, because they have a strong impact in the interpretation of sentences. Cohen [Cohen1988] notes that a user's goals and plans are related to the intentional structure of discourse [Grosz and Sidner1986], but they may contain information related to different levels of action-subaction decomposition with respect to what is explicitly said. For example, the system may infer that an action the user wants to execute is a step of a more complex action. Carberry [Carberry1988] and Lambert [Lambert and Carberry1991] introduce a hierarchical representation (the Context Model, CM) to represent the goals and the plans attributed to a user during a conversation, as well as the information about the expressed communicative intentions. Carberry claims that the former, together with what is presupposed by the sentences uttered during the dialogue, are part of the UM.
Although CMs represent the system's hypotheses on the user's intentions, we think that considering the whole set of CMs as part of the UM is problematic, because during a dialogue the system builds very frequently more than one hypothesis on the user's intentions. The alternative hypotheses should then be considered as belonging to different partitions of the UM, each one representing a different picture of the user. As the dialogue goes on, the partitions should be expanded with further beliefs about the users; so, there would be a combinatorial explosion of alternative possibilities. On the contrary, since CMs are activated because they represent a part of the domain knowledge relevant to the current situation, even in case more than one hypotheses (CMs) are active, they may reasonably have some common part. In this paper, we propose a way to use this part for selecting the intentions that can be ascribed to a user with a high degree of certainty. This strategy recalls the least commitment approach because, whenever alternative interpretations are identified, only the beliefs on which all of them agree are added to the UM and, as far as the others are concerned, their inclusion is deferred to the moment when more information is available for establishing their plausibility. We think that this is a solution for maintaining a unique view of a user's beliefs and interests, and for isolating part of the information brought by them, that can be used for expanding the UM via further inferences on its contents. Such information has a fundamental role in the expansion of the UM, because it is inferred from the user's utterances, in opposition to stereotypical knowledge, that is related to a general knowledge about users.