Project No. 2016/21/B/ST6/02158
Supported by National Science Centre Poland (NCN)

About

Project registration no. 2016/21/B/ST6/02158 supported by National Science Centre Poland (NCN) 2016-2020.

Research project objectives

The scientific objective of the project is to develop methods for induction of context-free and probabilistic grammars, word graphs and deterministic and nondeterministic finite automata. The project will verify the hypotheses on the role of counterexamples in the induction of probabilistic grammars, as well as concerning the existence of polynomial algorithms for induction of context-free grammars and nondeterministic automata. The effectiveness of the proposed methods will be verified in the practical way by constructing the algorithms and their computer implementations (programs) carrying out the processes of induction. The developed methods, algorithms and programs will be applied to classification of actual bioinformatics data. Here are the groups of tasks that will be the subject of project research:

Induction of probabilistic grammars, including the following tasks: Model of induction of probabilistic grammars (MI), Role of counterexamples in learning of probabilistic grammars (RK) and Application of the model of grammar induction in classification of bioinformatics data (ZM). The research goal of the task MI is to improve our original system of deterministic context-free grammar induction by extending it to the probabilistic model. In this model, we will take into account the counterexamples (task RK) that should increase the efficiency of learning a probabilistic grammar. In the task ZM we will apply the developed model to solve the issues of classification of actual bioinformatics data—such as amyloid proteins—or for prediction of RNA secondary structure.

Induction of noncircular context-free grammars and word graphs, including the tasks: Induction of noncircular context-free grammars (GB), Induction of word graphs (GS) and Application of noncircular context-free grammars and word graphs in classification of bioinformatics data (ZG). The task GB will involve research on methods of noncircular context-free grammar induction with constraints imposed on commonly formulated requirements in induction problems. In the task GS the methods for creating word graphs will be examined, in particular a polynomial algorithm constructing an acyclic word graph will be developed. The task ZG aims at verification of effectiveness of binary classification using noncircular grammars and word graphs for actual bioinformatics data.

Induction of nondeterministic automata and decomposition of finite languages, including the tasks: Induction of nondeterministic automata (IA), Decomposition of finite languages (DJ) and Application of automata induction and decomposition of languages in classification of bioinformatics data (ZD). The research conducted in the IA task will consist in development of the approach to transform minimization of an automaton into a nonlinear integer programming problem. As a part of task DJ the new extended formulation of the problem of decomposition of finite languages will be investigated. To identify the protein sequence similarities, which allows for detecting several functions of a given protein in task ZD, we will use the pre-existing results of research on the extended problem of decomposition of finite languages.

Research agenda

  • Model of induction of probabilistic grammars.
  • Role of counterexamples in learning of probabilistic grammars.
  • Application of the model of grammar induction in classification of bioinformatics data.
  • Induction of noncircular context-free grammars.
  • Induction of word graphs.
  • Application of noncircular context-free grammars and word graphs in classification of bioinformatics data.
  • Induction of nondeterministic automata.
  • Decomposition of finite languages.
  • Application of automata induction and decomposition of languages in classification of bioinformatics data.

Investigators

prof. dr hab. inż. Olgierd Unold

Principal Investigator

Wroclaw University of Science and Technology

prof. dr hab. inż. Zbigniew Czech

Co-investigator

Silesian University of Technology

dr hab. Wojciech Wieczorek, prof. ATH

Co-investigator

University of Bielsko-Biala

dr inż. Tomasz Jastrząb

PhD student

Silesian University of Technology

mgr inż. Mateusz Gabor

Project scholarship holder

Wroclaw University of Science and Technology

Publications

  1. Tomasz Jastrząb, Zbigniew Czech, Wojciech Wieczorek.
    An adaptive parallel algorithm for finite language decomposition.
    Springer Science and Business Media LLC, 2021. [ BibTex ] [ DOI ]
  2. Mateusz Gabor, Wojciech Wieczorek, Olgierd Unold.
    Split-Based Algorithm for Weighted Context-Free Grammar Induction.
    Applied Sciences, 2021. [ BibTex ] [ DOI ]
  3. Tomasz Jastrząb, Zbigniew Czech, Wojciech Wieczorek.
    Parallel Algorithms for Minimal Nondeterministic Finite Automata Inference.
    Fundamenta Informaticae, 2021. [ BibTex ] [ DOI ]
  4. Wojciech Wieczorek, Łukasz Strąk, Arkadiusz Nowakowski, Olgierd Unold.
    Grammatical Inference by Answer Set Programming.
    Computational Science - ICCS 2020, 2020. [ BibTex ]
  5. Tomasz Jastrząb, Zbigniew Czech, Wojciech Wieczorek.
    Generating Minimal Nondeterministic Finite Automata Using a Parallel Algorithm.
    International Symposium on Parallel and Distributed Computing, 2020. [ BibTex ] [ DOI ]
  6. Łukasz Culer, Olgierd Unold.
    Visualization of Membership Distribution in Strings Using Heat Maps.
    International Conference on Artificial Intelligence and Soft Computing, 2020. [ BibTex ]
  7. Olgierd Unold, Mateusz Gabor, Witold Dyrka.
    Unsupervised Grammar Induction for Revealing the Internal Structure of Protein Sequence Motifs.
    International Conference on Artificial Intelligence in Medicine, 2020. [ BibTex ]
  8. Wojciech Wieczorek, Tomasz Jastrząb, Olgierd Unold.
    Answer Set Programming for Regular Inference.
    Applied Sciences, 2020. [ BibTex ] [ DOI ]
  9. Wojciech Wieczorek, Olgierd Unold, Łukasz Strąk.
    Parsing Expression Grammars and Their Induction Algorithm.
    Applied Sciences, 2020. [ BibTex ] [ DOI ]
  10. Olgierd Unold, Mateusz Gabor, Wojciech Wieczorek.
    Unsupervised Statistical Learning of Context-free Grammar.
    Proceedings of the 12th International Conference on Agents and Artificial Intelligence, 2020. [ BibTex ] [ DOI ]
  11. Tomasz Jastrząb.
    On Superlinear Speedups of a Parallel NFA Induction Algorithm.
    Parallel Computing: Technology Trends, 2020. [ BibTex ]
  12. Wojciech Wieczorek, Olgierd Unold.
    GP-Based Grammatical Inference for Classification of Amyloidogenic Sequences.
    Computational Intelligence Methods for Bioinformatics and Biostatistics, 2019. [ BibTex ] [ DOI ]
  13. Olgierd Unold, Mateusz Gabor.
    How implicit negative evidence improve weighted context-free grammar induction.
    International Conference on Artificial Intelligence and Soft Computing, 2019. [ BibTex ]
  14. Wojciech Wieczorek, Olgierd Unold, Łukasz Strąk.
    Suffix Classification Trees.
    Proceedings of The 14th International Conference on Grammatical Inference 2018, 2019. [ BibTex ] [ PDF ]
  15. Olgierd Unold, Agnieszka Kaczmarek, Łukasz Culer.
    Iterative method of generating artificial context-free grammars.
    arXiv preprint arXiv:1911.05801, 2019. [ BibTex ]
  16. Tomasz Jastrząb.
    Performance Evaluation of Selected Variable Ordering Methods for NFA Induction.
    14th International Conference on Grammatical Inference, 2019. [ BibTex ]
  17. Tomasz Jastrząb.
    A comparison of selected variable ordering methods for NFA induction.
    Computational Science – ICCS 2019, 2019. [ BibTex ]
  18. Tomasz Jastrząb.
    Two Parallelization Schemes for the Induction of Nondeterministic Finite Automata on PCs.
    Lecture Notes in Computer Science, 2018. [ BibTex ]
  19. Olgierd Unold, Agnieszka Kaczmarek, Łukasz Culer.
    Visual report generation tool for grammar-based classifier system.
    International Journal of Machine Learning and Computing, 2017. [ BibTex ]
  20. Tomasz Jastrząb.
    Parallel induction of nondeterministic finite automata revisited.
    AIP Conference Proceedings, 2017. [ BibTex ]

Summary

During the project period, three investigators received their next scientific degree: O. Unold promoted to full professor, W. Wieczorek defended his postdoctoral thesis, and T. Jastrząb granted a doctoral degree.

Contact

In case of any questions please write to prof. O. Unold