خبير فى علوم الحاسب والتعليم Expert in Computer Science and Education
Selected projects
"Karect: Accurate Correction of Substitution, Insertion and Deletion Errors for Next-generation
Sequencing Data":
I implemented the complete efficient system for "Karect". Karect is a novel error
correction technique for next-generation sequencing, based on multiple alignment,
supporting substitution, insertion and deletion errors. It can handle non-uniform
coverage as well as moderately covered areas of the sequenced genome. Experiments
with data from Illumina, 454 FLX and Ion Torrent sequencing machines demonstrate
that Karect is more accurate than previous methods, both in terms of correcting
individual-bases errors (up to 10% increase in accuracy gain), and post de novo
assembly quality (up to 10% increase in NGA50). Karect has been published in the
Bioinformatics journal, a top-ranked journal in the bioinformatics area. Karect is
available here.
"ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings":
I implemented the complete efficient system for "ERA", constructing suffix trees for very long strings. ERA indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer. I implemented these efficient variants of the system: a) Serial: for single-core processor. b) Parallel shared-memory: for multicore processor. c) Parallel shared-nothing: for linux cluster. A research based on this system has been published in the Proceedings of the Very Large Database Endowment, a top-ranked journal in the database area.