Temporal Data Mining

The application of data mining techniques to the medical and biological domain has gained great interest in the last few years, also thanks to the encouraging results achieved in many fields. One issue of particular interest in this area is represented by the analysis of temporal data, usually referred to as Temporal Data Mining (TDM). Within TDM, research focuses on the analysis of time series, collected measuring clinical or biological variables at different points in time. The explicit handling of time in the data mining process is extremely attractive, as it gives the possibility of deepening the insight into the temporal behavior of complex processes, and may help to forecast the future evolution of a variable or to extract causal relationships between the variables at hand. 

An increasing number of TDM approaches is currently applied to the analysis of biomedical data; in functional genomics, for example, clustering techniques have been largely exploited to analyze gene expression time series, in order to assess the function of unknown genes. TDM has also been successfully used to study gene expression time series of particular cell lines which are crucial for understanding key molecular processes of clinical interest, such as the insulin actions in muscles and the cell cycle in normal and tumor cells. Several works have been proposed also for what concerns the representation and processing of time series coming from the monitoring of clinical parameters, collected for example during an ICU staying.

Temporal Rules

One of the most attractive applications of AI-based TDM concerns the extraction of temporal rules from data. Unlike association rules, temporal rules are characterized by the fact that the consequent is related to the antecedent of the rule by some kind of temporal relationship; moreover, a temporal rule typically suggests a cause-effect association between the antecedent and the consequent of the rule itself. When applied to the biomedical domain, this could be of particular interest, for example in reconstructing gene regulatory networks or in discovering knowledge about the causes of a target event.

Learning temporal rules with complex patterns in biomedical time series

We developed two algorithms for the mining of temporal rules from data. In the first we defined a method for the discovery of both association and temporal rules to get an insight into the possible causes of non-adherence to therapeutic protocols in hemodialysis, through the analysis of a set of monitoring variables. Time series are first summarized through qualitative patterns extracted with the technique of knowledge-based Temporal Abstractions (TAs); then, possible associations between those patterns and the non-adherence events are mined with an APRIORI-like procedure. The method only treats rules with antecedents composed by the conjunction of simple patterns (i.e. patterns of the kind “increasing”, “decreasing”, …), where the conjunction is interpreted as a co-occurrence relationship (i.e. “variable A increasing” occurs at the same time of “variable B decreasing”). If this conjunction temporally precedes another simple pattern, say “variable C increasing”, sufficiently often, a rule of the kind “variable A increasing and variable B decreasing precedes variable C increasing” is generated. 
In the second algorithm, an extension of the first method is proposed, aimed at extracting rules with arbitrarily complex patterns as members of both the rule antecedents and consequents. Such patterns can be defined in advance by the user (typically relying on prior knowledge on the problem domain), or they might be automatically generated by a complex pattern extractor. This extension is able to deal with the search of relationships between complex behaviors, which can be particularly interesting in biomedical applications. For example, a drug is first absorbed and then utilized, so that its plasma distribution precedes its effect in the target tissue. In this case, it would be important to look for complex episodes of “up and down” type in the drug plasma concentration, to automatically extract temporal knowledge in the data. The method enables the user to define episodes of interest, thus synthesizing the domain knowledge about a specific process, and to efficiently look for the specific temporal interactions between such complex episodes.


People working on the topic:

Lucia Sacchi, Cristiana Larizza, Paolo Magni, Riccardo Bellazzi