|Home \ Graduation Activities \ Post-Graduation Page||Login|
Detecção de Correlação e Causalidade em Séries Temporais não Categóricas
Time series are present in many areas of our daily lives - areas as diverse as astronomy, geophysics, economics, medicine, among others. Information technologies currently have the ability to generate large amounts of data, in part represented as time-series. Analyzing the huge amount of generated data is a task that is exceeding human capabilities. It was estimated that in 2005, over 150 exabytes of data were generated worldwide. For 2010, it is estimated to be close to 1200 exabytes. On a specific example, the VST (VLT Survey Telescope of the European Southern Observatory), it is predicted that the camera, made up of 32 CCD’s for a total of 268 megapixels, will produce 30 terabytes of data annually. To extract information and therefore generate knowledge from such a vast amount of data, it is necessary to use techniques to automate the analysis of these data efficiently. This thesis aims to contribute specifically to the analysis of non categorical time-series (i.e., numeric values), with a set of tools that aid in the detection of correlations among multiple time series, and the detection of periodicities associated to them. It also aims to determine causality between parameters, i.e., to find causality between different time series. The objectives of this thesis are: - detection of positive and negative correlations, in sets of parameters; - detection of correlations taking into account time differences; - detection of periodicities in time series; - determination of the causality between parameters/time series - efficient computing with large time series. Regarding the detection of periodicities, several techniques exist and are widely used, namely Fourier transforms and wavelets. Nonetheless, this thesis aims to present an alternative approach, on one hand based on the definition of Pearson’s correlation coefficient, computationally simpler, and on the other hand, providing a much more intuitive visualization than the aforementioned approaches. Regarding causality, although it is detectable in categorical data, it is difficult to find the cause-effect direction for real numerical values. An efficient method for detecting causalities will be proposed in the context of this thesis. The method is not based on linear regressions and so, it does not depend on the quality of these regressions. Solar astrophysics shall be our main focus as case study, analysing time series from solar parameters. Some periodicities and correlations in this area are already known, (8), which is a good basis for validation of the proposed approach. With this approach, it is expected to provide the opportunity for new discoveries. We will have the close collaboration of an expert in the field of astrophysics, who will give us the necessary validation and guidance for the data set to explore for initial tests. However, it is intended that the methods and tools resulting from this approach can be applied to any domain expressed in time-series, so no characteristics related to specific areas will be introduced into the algorithms.
Start Date: 2011-04-12
End Date: 2012-05-22
Post-Graduation Student / Researcher / Professor: