The main goal of the project is the development of methodologies, techniques, and tools for the management and the analysis of large collections of temporal, and sometimes also spatial, data.
The first part of the project will be devoted to the analysis of existing tools for the integration of data, which are distributed over heterogeneous data sources (databases that possibly differ in their goals, structures, and contents) in a single data warehouse. Besides standard tools and methodologies for conceptual and logical design, we will take into consideration existing tools for schema matching (the process that allows one to establish whether, and to what extent, two components of two different schemas are semantically related), such as, for instance, COMA, schema mapping (the transformation of a schema into another one), such as Clio, and instance mapping (the transformation of an instance of a database into an instance of another one).
Besides a comparative analysis of the various tools, we will execute an experimental evaluation of them on some case studies taken from different application domains, such as spatio-temporal databases for positioning systems, databases for configuration systems, and temporal databases for the management of inbound and outbound campaigns in call center systems.
In the second part of the project, we will study existing tools for the temporal (and spatial) analysis of data, ranging from data mining tools to standard tools for statistical analysis. A special attention will be reserved to techniques and tools for temporal aggregation and temporal data mining. Data mining aims at exploring a large set of data to identify existing regularities, to extract meaningful knowledge, and to derive recurrent rules. Temporal data mining can be exploited to analyse data collections where the time variable plays a fundamental role (time series). Time series make it possible to describe the evolution of a given phenomenon over time (its dynamics), and they can analysed both to interpret the phenomenon under consideration, by identifying recurrent patterns and trends, and to foresee its future evolution.
In the specific case of temporal data mining, the study of time series allows one to identify the most appropriate representation of available data, to group (cluster) and classify data, to execute similarity analysis among different time series, to make suitable hypotheses about the existence of specific causal relationships among acquired data, to build models for predicting phenomena of interest, and to make real-time simulations of future scenarios based on historical data.
As for the tools for statistical analysis, they are traditionally used to detect the relationships that connect available data, by identifying possible regularities (interpretation models, regression models, classifications, historical series).
As in the first phase of the project, the comparative analysis of existing techniques and tools will be paired with their experimental evaluation on the same case studies.
On the basis of the outcomes of the first two phases of the project, we will define the software architecture of a system for the management and analysis of (spatio)temporal data, and we will develop and implement a prototypical version of such a system for an experimental evaluation of the proposal (we will make use of one of the considered case studies). Besides the data warehouse component, the system will offer a set of basic and advanced functionalities for the analysis of temporal (and possibly spatial) data. More precisely, the systems will feature (and integrate) the following three fundamental components: (i) a data warehouse, constantly loaded with new data, possibly at a high rate; (ii) a set of data mining, analysis, simulation, and optimization algorithms; (iii) a user interface to allow non-expert users to interact with the system.