DIPARTIMENTO   DI   INFORMATICA
Università di Torino

The Spoken Digits Learning Problem

The problem concerns the recognition of the ten digits spoken in Italian as isolated words, starting from the time evolution of two rough features, namely the zero-crossing and the total energy of the signal.

The following picture shows a typical graph of the total energy of a "zero" spoken digit.

The features are extracted from the signal using classical signal processing algorithms and then described by using a set of primitives as proposed in
R. DeMori, A. Giordana, P. Laface and L.Saitta: "An expert system for mapping acoustic cues into phonetic features", Information Sciences, vol. 33, pp 115-155.
In particular, the graphs of the two features are segmented into contiguous intervals corresponding to four types of elementary shapes, further characterized by four numeric attributes.
Each instance is made of two groups of elementary objects, one describing properties of total energy segments (objects labelled "et") and one describing properties of zero-crossing segments (objects labelled "zc").
Data are stored in tabular form, similarly to a textual representation of a relation of a relational database, each line describing a segment of an instance. The data file has 10 columns, whose meaning is the following:

  • 1) #id of the instance
  • 2) #class (1="zero", 2="uno", 3="due", 4="tre", 5="quattro", 6="cinque", 7="sei", 8="sette", 9="otto", 10="nove")
  • 3) #id of the part of the instance (0-based)
  • 4) type of object: specifies if the object belongs to the zero crossing "zc" or to the total energy "et".
  • 5) shape of object: c1 (a peak), c2 (flat segment), c3 (monotonically increasing segment), c4 (monotonically decreasing segment)
  • 6) initial time in centiseconds.
  • 7) end time in centiseconds.
  • 8) width in centiseconds (this is redundant)
  • 9) max height in dB
  • 10) area of the segment (integral of the function over the interval)
Columns 1 (instance id) and 3 (object id) constitute a key to uniquely identify a particular segment in the dataset.

Relevant a-priori knowledge:

  • all instances start and terminate with a 'c2' shape object of null area (this is due to the segmentation algorithm); since this might be misleading for some system, it is suggested to discard these objects (in all our systems, we used a predicate 'central(x)', that is true if an objects neither starts nor ends a signal).
  • useful features that can be extracted from the data and used for learning includes:
    • comparing width, height and area of two segments with same shape
    • comparing width, height and area of two segments from same signal (et or zc)
    • comparing width, height and area of two segments from different signals
    • mutual position of two segments from same signal
    • mutual position of two segments from different signals
The file training.data.gz contains 219 instances (22 per class, 21 for the last class) used as learning set, whereas the file test.data.gz contains 100 instances (10 per class) used as test set.

The best results obtained on this problem by our systems are the following:

	First version of ML-Smart (*)

Correct classifications :			77	77.00 %
Wrong classifications :				 0	 0.00 %
Ambiguously-classified (with correct class) :	23	23.00 %
	Ambiguities among two classes :		 9	 9.00 %
	Ambiguities among four classes :	 7	 7.00 %
	Ambiguities among six classes :		 7	 7.00 %
Ambiguously-classified (without correct class) : 0	 0.00 %

	Smart+ (not published)

Correct classifications :			75	75.00 %
Wrong classifications :				 9	 9.00 %
Ambiguously-classified (with correct class) :	16	16.00 %
	Ambiguities among two classes :		14	14.00 %
	Ambiguities among three classes :	 1	 1.00 %
	Ambiguities among four classes :	 1	 1.00 %
Ambiguously-classified (without correct class) : 0	 0.00 %

	Smart+ + NTR (**)

Correct classifications :                       82      82.00 %
Wrong classifications :				 9	 9.00 %
Not classified :				 9	 9.00 %

	Smart+ + FONN (**)

Correct classifications :                       82      82.00 %
Wrong classifications :				 6	 6.00 %
Not classified :				12	12.00 %

(*) Results published in F. Bergadano, A. Giordana and L. Saitta: "Automated Concept Acquisition in Noisy Environments", IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-10, 1988, pp 555-578.
(**) Preliminary results published in M. Botta, A. Giordana, and R. Piola: "An integrated framework for learning numerical terms in FOL", Proc. of the ECAI-98, (Brighton, UK, 1998), pp. 415-419.


Further Information: botta@di.unito.it Last update: Sep 21, 1998

Department home [Information] [People] [Research] [Ph.D.] [Education] [Library] [Search]
[Bandi/Careers] [HelpDesk] [Administration] [Services] [Hostings] [News and events]

Administrator: wwwadm[at]di.unito.it Last update: Mar 08, 1999