PGS: Peculiar Genes Selection.
Developers: Federica Martina, Marco Beccuti, Gianfranco Balbo, Francesca Cordero.
We present a new feature selection method based on three steps to detect class-specific biomarkers in case of high-dimensional data sets. The first step detects the differentially expressed genes according to the experimental conditions tested in the experimental design, the second step filters out the features with low discriminative power and the third step detects the class-specific features and defines the final biomarker as the union of the class-specific features. Using the proposed feature selection procedure, the classification performances of a Support Vector Machine on the imbalanced data set reach a 82% whereas other methods do not exceed 73%. The Gene Ontology enrichments performed on the signatures selected with the proposed pipeline, confirm the biological relevance of our methodology. The package PGS is available for R users.
PGS tool and datasets.
Chimera: a Bioconductor package for secondary analysis of fusion products.
Developers: Raffaele A. Calogero, Matteo Carrara, Marco Beccuti, Francesca Cordero.
Chimera is a Bioconductor package that organizes, annotates, analyses and validates fusions reported by different fusion detection tools; current implementation can deal with output from,
bellerophontes, chimeraScan, deFuse, fusionCatcher, FusionFinder, FusionHunter, FusionMap, mapSplice, Rsubread, tophat-fusion, and STAR. The core of Chimera is a fusion data structure that can store fusion events detected with any of the above mentioned tools. Fu-
sions are then easily manipulated with standard R functions or through the set of functionalities specifically developed in Chimera with the aim of supporting the user in managing fusions and discriminating false positives.
Official Chimera web page.
HashFilter: a tool for supporting read de-convolution.
Developers: Francesca Cordero, Marco Beccuti.
HashFilter is C++ tool implementing an innovative read de-convolution algorithm based on hash table.
It was used to obtain the physical, genetic and functional sequence assembly of the barley genome in the project Advancing the Barley Genome (CRIS NUMBER: 0218967).
Official HashFilter web page.
CoCoClust: Constrained Co-clustering via Sum-Squared Residue Minimization
Developers: Ruggero G. Pensa, J-F. Boulicaut, Francesca Cordero, Maurizio Atzori, Dino Ienco.
In the generic setting of objects x attributes matrix data analysis, co-clustering appears as an interesting unsupervised data mining method. A co-clustering task provides a bi-partition made of co-clusters: each co-cluster is a group of objects associated to a group of attributes and these associations can support expert interpretations. Many constrained clustering algorithms have been proposed to exploit the domain knowledge and to improve partition relevancy in the mono-dimensional clustering case (e.g. using the must-link and cannot-link constraints on one of the two dimensions). Here, we consider constrained co-clustering not only for extended must-link and cannot-link constraints (i.e. both objects and attributes can be involved), but also for interval constraints that enforce properties of co-clusters when considering ordered domains. We provide an iterative co-clustering algorithm which exploits user-defined constraints while minimizing two different residues: Hartigan's and Cheng-Church's.
Upon request at: firstname.lastname@example.org
GOClust: Gene Ontology driven Co-clustering of Gene Expression Data
Developers: Alessio Visconti, Dino Ienco, Francesca Cordero, Ruggero G. Pensa.
Official GOClust web page.