Mining Usage Patterns from a Repository of Scientific Workflows
Title | Mining Usage Patterns from a Repository of Scientific Workflows |
Publication Type | Conference Proceedings |
Year of Conference | 2012 |
Authors | Diamantini C, Potena, D, Storti E |
Conference Name | PROCEEDINGS OF THE 2012 ACM SYMPOSIUM ON APPLIED COMPUTING |
Pagination | 152 - 157 |
Date Published | 2012 |
Publisher | ACM |
Abstract | In many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is generated, that formally describes experimental protocols and the way different tools are combined inside experiments. In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workflows from a repository. Since sub-workflows represent significant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or to retrieve and reuse workflows. Such a knowledge, ultimately, leverages the potential of scientific workflow repositories to become a knowledge-asset. A set of experiments is conducted on the myExperiment repository to assess the effectiveness of the approach. |