Text Mining in biological research literature (BioMinT)
I was involved in the BioMinT EU-funded research project (2003-2005), which was concerned with text mining in biological research literature and online databases. I was directly responsible for:
- Implementation of a parser for fourteen online databases, transitive closure on the link graph, output as name pairs (synonyms) - 12 million pairs in all
- Research and development in biological species recognition, named entity recognition, ranking and filtering systems.
- Evaluation and validation of the system.
- Basic research on redundancy recognition (sentence entailment)
Invited by Dr.med. Michael Steffens of the Institute for Medical Biometry, Informatics and Epidemiology at the Medical Faculty of the University of Bonn, we held a talk about our current projects.
Research, design and development of a SpamAssassin-based spam filter system (sampling methodology, training methodology, evaluation), initially seven test users, prepared for institute-wide deployment; involved in many locally and EU-funded research projects.
Seewald A.K., Kleedorfer F.: An Approximation of the String Subsequence Kernel for Practical SVM Classification and Redundancy Clustering. Journal for Advances in Data Analysis and Classification, Vol. 1, Number 3 / December 2007, pp. 221-239, DOI: 10.1007/s11634-007-0012-1.
Dehaspe L., Attwood T.K., Daelemans W. et al. BioMinT: the Research Assistant for Biological Text Mining. Knowledge for Growth 2005, Gent, 3rd of June, 2005.
Pillet V., Zehnder M., Seewald A.K., Veuthey A-L, and Petrak J. GPSDB: a new database for synonyms expansion of gene and protein names. Bioinformatics 2005 21: 1743-1744.
Seewald A.K., Kleedorfer F.: Lambda Pruning - An Approximation of the String Subsequence Kernel. Technical Report, Austrian Research Institute for Artificial Intelligence, Vienna, TR-2005-13, 2005.
Seewald A.K.: Ranking for Medical Annotation: Investigating Performance, Local Search and Homonymy Recognition. Proceedings of the Symposium on Knowledge Exploration in Life Science Informatics (KELSI 2004), Milano, Italy.
Seewald A.K.: Evaluating Protein Name Recognition: An Automatic Approach. Workshop on Data Mining and Text Mining for BioInformatics, 14th European Conference on Machine Learning (ECML-2003), Dubrovnik-Cavtat, Croatia, 2003.
Seewald A.K.: Recognizing Domain and Species from MEDLINE Proteomics Publications. Workshop on Data Mining and Text Mining for Bioinformatics, 14th European Conference on Machine Learning (ECML-2003), Dubrovnik-Cavtat, Croatia, 2003.