Mining Usage Patterns from a Repository of Scientific Workflows

TitleMining Usage Patterns from a Repository of Scientific Workflows
Publication TypeConference Proceedings
Year of Conference2012
AuthorsDiamantini C, Potena, D, Storti E
Conference NamePROCEEDINGS OF THE 2012 ACM SYMPOSIUM ON APPLIED COMPUTING
Pagination152 - 157
Date Published2012
Publisher ACM
AbstractIn many experimental domains, especially e-Science, workflow management systems are gaining increasing attention to design and execute in-silico experiments involving data analysis tools. As a by-product, a repository of workflows is generated, that formally describes experimental protocols and the way different tools are combined inside experiments. In this paper we describe the use of the SUBDUE graph clustering algorithm to discover sub-workflows from a repository. Since sub-workflows represent significant usage patterns of tools, the discovered knowledge can be exploited by scientists to learn by-example about design practices, or to retrieve and reuse workflows. Such a knowledge, ultimately, leverages the potential of scientific workflow repositories to become a knowledge-asset. A set of experiments is conducted on the myExperiment repository to assess the effectiveness of the approach.