Mingyi Liu, Ph.D

 

OBJECTIVE

 

Bioinformatics scientist position in a biotechnology or pharmaceutical company that demands solid software development skills and strong biological research background
 

QUALIFICATIONS

 

·       3 years of Bioinformatics software development experience that includes:

o      Leading various large database projects integrating and assembling genome-based annotation

o      Database development for annotation, management and analysis of oncology target screening results

o      Substantial genome-wide analytical research experience, with one paper ready for submission, another in preparation

·       10 years of combined biology education and research experience in transcription and gene expression studies

o      Obtained Ph.D degree for study on RNA polymerase II transcript termination factor and Pol II elongation regulation

·       Programming experience since high school, with significant software development experience:

o      5 years of C++ programming on Win32/Linux/Unix, with a highly popular C++ shareware developed for Win32/Linux/FreeBSD

o      3 years of Perl programming on Linux/Win32.  Experienced with Bioperl/EnsEMBL packages

o      2.5 years of SQL programming with PostgreSQL, MySQL, and Oracle.

o      1 year of Java programming experience

o      1 year of PHP experience

o      Other skills listed in detail below

·       None-technical aspects:

o      Excellent communication skills and a team player

o      Highly motivated, creative and fast learning

o      Quick at identifying questions and developing solutions

 

EDUCATION

 

Ph.D.  Biochemistry, University of Iowa, Iowa City, IA. July 2001

M.S.   Computer Science, University of Iowa, Iowa City, IA. May 2001
B.S.    Biochemistry and Mol. Biol., Peking University, Beijing, China. May 1995
 

SUMMARY OF SKILLS

 

Bioinformatics: Public protein and DNA databases such as EnsEMBL, SWISS-PROT, Genbank, EMBL, Interpro, PDB, among others, as well as several commercial databases; Bioinformatics algorithms such as HMMER, BLAST, Smith-Waterman, FASTA, gene finding algorithms, statistical analysis; Hands-on experience in SRS, Bioperl, R, GCG, ClustalW, Macaw, Spotfire, VectorNTI among other Bioinformatics and Statistics software packages and toolkits; Familiar with microarray and SAGE technologies

Computer Science: C/C++ (x-platform), Perl, Java (Swing, Jface/SWT), XML, PHP, Python, SQL, PostgreSQL, MySQL, Oracle, Unix IPC, RPC, network programming, CGI, JavaBeans, JavaScript, VBScript, X-platform support in Linux, Win32 and Unix flavors

Biology: In vitro transcription, protein expression in bacteria, insect and mammalian systems and purification with HPLC/FPLC and affinity chromatography; blotting, cloning, RT-PCR, cell culture, cytometry, confocal microscopy, ATPase and Kinase assays, immuno-precipitation, and more

 

EMPLOYMENT

 

Computational Biologist, GPC Biotech Inc. Waltham, MA.  August 2001– present

·       Developed an analysis/database software package to analyze and annotate human EST sequences generated by internal target screening pipeline, as well as to store, track and analyze data from screening and gene expression assays. 

o      Developed Perl-based annotation pipeline that provided:

§       EST identification.  Multiple databases, screen-specific information and heuristic algorithms based on blast were developed to reliably select the best hit for each clone automatically.

§       Public DB-based gene annotation from EnsEMBL, Refseq, Swissprot etc.

§       Alternative splicing annotation based on ab initio analysis.

o      Developed a series of visualization tools to help scientists quickly identify important information visually, including tools to:

§       Map clones to transcripts for selection of best clone.

§       Blast two batches of sequences against each other and visualize results ¾ a more powerful blast result viewer than NCBI’s bl2seq visualization tool.

§       Retrieve and display domain information on transcripts and clones.

§       Map any cDNA sequences to genomic sequence and visualize the exons.

o      Developed web-based data analysis tools.

§                                  Oncology data including Affymetrix results and other apoptosis and cell death assay results were in a relational database (Postgres) at backend.

§                                  Dynamic query builders were implemented to enable user input arbitrary rules to filter multiple Affymetrix and cell death assay results at the same time. 

§                                  Rich analysis interface allows scientists gather valuable information with a single click ¾ such as comparing genes identified in different stages of target screen, or redundancy of current screen stage ¾ to decide further screening/sequencing strategy.

§                                  Developed multiple expression-based or efficacy-based analysis tools to aid candidate selection process.

o      Other feature and technical highlights for this project includes:

§       Administration of Apache web server and internal EnsEMBL databases.

§       Designed a remote object retrieval method to help the annotation process.  The cross-language method is more robust and significantly faster to implement than SOAP/XML for complex objects.

§       Tight access control of data provides multi-user and multi-project support with individual permission settings.

§       User comment capability allows annotation of any database objects and has multi-user, multi-project support.  Comments are persistent over DB updates.

§       Designed in Perl a parallel processing paradigm to take advantage of multiple CPUs for any repetitive tasks.   

§       Co-developed and coded most of the front-end PHP/JS + Apache based web user interface.

§       Developed automated update process to keep DB annotation up-to-date.

§       Designed flexible data storage and retrieval model that provides easy back link to user files and data export functionality.

§       Provided context-sensitive web-based manual, user training sessions and interacted with scientists frequently to provide help with data analysis, database usage, bioinformatics training and decision support.

§       Performed software installation, configuration and other tasks to transfer database package to a client company.  Provided update, synchronization and maintenance of the databases.

·       Led the research effort by a subgroup of four Bioinformatics scientists in a proteomics research project to determine the correlation of gene structure or splicing profile with protein domain organization.

o      Using public databases such as EnsEMBL, Swissprot, Genbank, ASDB and commercial database such as Doubletwist Prophecy (integrated as described below), I identified and assessed statistically significant cases.  The subgroup combined effort in detailed investigation of individual cases. 

o      Ready to submit one paper and preparing another.

·       Developed an XML based database and associated analysis software to annotate known public and proprietary information for human kinome. 

o      Perl-based analysis software provides genome-based annotation and data processing for visualization by an internal visualization tool. 

o      Perl CGI-based web frontend enables search and retrieval of annotations. 

o      Developed an XML indexing scheme to enable fast searching and sorting data in arbitrary XML fields.

·       Led a subgroup of two people to integrate Geneindex data from Doubletwist Prophecy into internal SRS system. 

o      Developed generic methodology to convert a complex and recursive data structure losslessly into a flat structure. 

o      Successfully integrated Prophecy data in an XML data structure with over 100 elements and attributes into SRS, despite the weak support of XML in SRS 6.x. 

o      Provided uniformed data access through SRS for easy query and retrieval of Geneindex data.

·       Successfully integrated genome annotation of Doubletwist Prophecy into SRS.

o      Developed strategy to perform two-step integration that produced two cross-linked DNA and protein DBs with better querying capability, easier navigation, better annotation and visualization, as well as high-throughput processing. 

o      Prophecy data was rendered a seamless part of internal data resource infrastructure with a flexible API that was previously only available from Doubletwist for a high cost. 

o      Generic methods developed can serve as general algorithm for similar problems that are frequently encountered in Bioinformatics when integrating different technology platforms and data resources.

·       Developed XML databases related to alternative splicing of human genes.  Integrated these databases into SRS to enable querying of the public resources and proprietary data known on splicing variants.

·       Evaluated Smith-Waterman vs. BLAST algorithms under different parameter settings. 
Evaluation was done using enormous power of large-scale distributed computing platform.  Results were instrumental in computational decisions.

·       Co-developed a software implementation of a neural networks solution to predict and report signal peptides, and automated design of primers for signal peptide containing proteins.

·       Developed DB access modules for all SRS-integrated databases in Perl. 

o      Automated generation of these object-oriented Perl modules and test scripts. 

o      Completely internalized data resource access through these modules that complement with Bioperl modules for data access and analysis from different resources.

·       Helped testing and diagnosis of internal Beowulf clusters. 
Designed patterns for multi-processing locally and monitoring and automatic resubmission of failed jobs on cluster (due to failure of cluster).

·       Helped evaluating and extending various Bioinformatics software packages and databases for internal usage.

 

OTHER RELEVANT EXPERIENCE

 

University of Iowa, Iowa City, Iowa. August 1995 - July 2001

Part-Time Projects:

·       Developed both the front-end and back-end of a highly customizable internet database management system in C++ that was:

·       Downloaded more than 23000 times.

·       Ranked as high as #1 and is currently the most popular and highest ranked C++ shareware on www.hotscripts.com, which has much more web traffic than apache.org, php.net, and mysql.com, based on alexa ranking. 

·       Ported to Linux and FreeBSD from Win32 MFC by myself.

·       Translated into 4 languages by volunteer users from at least 8 countries. 

·       Participated in a Java-XML project to search for and display BLAST alignments and annotations in NCBI-style on large scale.

·       Setup and administered HTTP, FTP servers on both Win9x/NT and Linux.
Help built or maintained personal, departmental lab, Univ. of Iowa Chinese student organization and Beijing University Biology Dept. alumni web sites.

·       Served on Chinese student organization board during 99-00 and increased organization funding by 25%. 
Made significant contributions to organization of events, coordination of activities, as well as article authoring, editing and publishing of a journal.

In Biochemistry Department:

·       Used various biochemistry, molecular and cellular biology techniques to clone and study the transcription termination factor for Human RNA polymerase II. 
Used GCG packages and EST database to identify the human homologue and its protein family.  Used 5’ RACE to get full-length sequence and cloned the gene.  Protein was expressed in baculovirus vector and sf9 insect cell line.  For details of the study please see relevant skills and publications sections.

Teaching Assistant:     

142:215 “Molecular Biology of Gene Expression”

99:140 “Experimental Biochemistry” (undergraduate lab course)

In Computer Science Department:

·       Implemented Smith-Waterman local alignment algorithm in a multi-process C program on UNIX.

·       Developed a Swing-based object-oriented graphics program.

·       Various projects involving socket programming, Unix IPC, RPC, JavaBeans, CORBA technologies.  Courses such as advanced algorithms, operating systems, computer architectures, DBMS etc.

Teaching Assistant (Both graduate level core course):      

22C:153 “Design and Analysis of Algorithms”

22C:135 “Introduction to Computation Theory”

 

PUBLICATIONS

 

·       Liu, M., Xie, Z., and Price, D.H. (1998) J. Biol. Chem. 273 (40), 25541-25544

·       Peng, J., Liu, M., Marion, J., Zhu, Y., and Price D.H. (1998) 63rd Cold Spring Harbor Symposium on Quantitative Biology 63:365-370

·       Hara, R., Selby, C.P., Liu, M., Price, D.H., and Sancar, A. (1999) J. Biol. Chem. 274: 24779-24786

·       Liu, M. and Price, D.H. (1997) Promega Notes 64:21-25

·       Liu, M. and Price, D.H. in preparation.

·       Liu, M. et al and Grigoriev, A. in preparation

 

HONORS AND AWARDS

·                Awarded for Excellence in organizing departmental activities in college.

·                Exempt from national College Admission Test to enter Peking University, the top Univ. in China

·                Winner of National High School Competitions in Wuhan City (population 6 million): No.1 in Chemistry competition, top 10 in Physics and top 20 in the Mathematics.

 

CONTACT

 

(978) 509-1077 (Cell)

(617) 527-4867 (Home)

E-Mail: mingyiliu@yahoo.com

URL: http://www.mingyi.org/

 

REFERENCES  Available upon request