Simone MARINI, PhD

(.) Postdoc FellowLaboratory for Biomedical Informatics, University of Pavia (since Dec 2016)

(.) Scientific Advisor, enGenome (since Dec 2016)


Postdoc fellow, Akutsu LaboratoryUniversity of Kyoto, Japan (2015 - 2016)

Laboratory for Biomedical Informatics, University of Pavia (2013 - 2015)

PhD Candidate, The Hong Kong University of Science and Technology (2008 - 2012)

Last update: Dec 2016

Laboratory for Biomedical Informatics,

University of Pavia,

via Ferrata 5, 27100 Pavia (Italy).

simone (_dot_) marini (at_) unipv (_dot) it

Who I am

As a scientist, I work in Bioinformatics, mainly applying Machine Learning to infer prediction models and simulations.

I work on a wide variety of data, such as electronic health records, genomic variants, ontologies, protein sequences; and techniques, e.g. support vector machines, random forest, bayesian networks, data fusion.

My research projects span over Italy, China and Japan, and involve people working for

I lived in Madrid, Hong Kong, Beijing and Kyoto. I currently live in Pavia.


Protein cleavage target prediction

            Technique       Matrix tri-factorization
            Technology     Octave
            Data                KEGG, MEROPS, Domine, 3did, Negatome, BioGRID, Interpro, STRING

NGS epilepsy multiaxial association study

            Technique       Random Forest, Burden Methods
            Technology     Perl, Weka
            Data                NGS data, KEGG, Interpro, BioGRID

Cohort simulation of Type 1 and 2 diabetes

            Technique       Dynamic Bayesian Networks, Continuous Time Bayesian Networks
            Technology     MATLAB, R
            Data                EDIC, DCCT, Electronic Health Records

Genomic variant deleteriousness prediction

            Technique       Ensemble Learning, Cost-sensitive Learning
            Technology     Perl, Weka, AJAX, Glassfish
            Data                NGS, HGMD, 1TGP, NHLBI GO Exome Sequencing Project

SNP selection and effects on sample mislabeling on Machine Learning

            Technique       Markov Chain Monte Carlo, Machine Learning
            Technology     Weka, MATLAB
            Data                Genotyping

DNA-, RNA- and protein-protein interaction (or affinity) prediction

            Technique       Ensemble Learning, Support Vector Machines
            Technology     Weka, Perl
            Data                Dscam1, Protein-interactions

- - - - - - - - - - - - - - - - -


2015-2016       Japanese Society for the Promotion of Science Postdoctoral Fellowship.

2015                Outstanding contribution in reviewing, Journal of Biomedical Informatics (Elsevier).

2011                Bioengineering Division Graduate Student Research Award, 1st ranked.

2010                HKUST Overseas Research Award for PhD Students.

- - - - - - - - - - - - - - - - -


2017                January 12. Investigating epileptogenesis with data fusion. University of Michigan, Ann Arbor, USA.

2016                September 8. Mining heterogeneous data sources to enhance association studies. University of Arizona, Tucson, USA.

June 10. Leveraging on public databases for novel peptidase target discovery. Electrical, University of Pavia, Pavia, Italy.

2011                May 13. Motif search, sequence alignment and Support Vector Regression for Dscam protein self- and hetero-binding affinity prediction. Institute of Biophysics, the Chinese Academy of Science, Beijing, China.

- - - - - - - - - - - - - - - - -


 Kyoto University, Japan.

Supervision of summer internships (2016).

 University of Pavia, Italy.

Medical Informatics (2013-2015), Instructor of record, undergraduate.

Automatic Learning in Medicine (2013-2015), Instructor of record, postgraduate. 

Co-supervision of five MSc and one BSc dissertations (2013-2015; 2017-present).

Supervision of summer internships (2014).

The Hong Kong University of Technology, China.

Introduction to Bioengineering (2010), Teaching assistant, postgraduate     

- - - - - - - - - - - - - - - - -


Journal Reviewer                    Journal of Biomedical Informatics (since 2014)

     Briefings in Bioinformatics (since 2015)

     Computers in Biology and Medicine (since 2016).


Conference Reviewer            Artificial Intelligence in Medicine (since 2016)

                                               American Medical Informatics Association joint Summits on Translational Science (since 2016)

My publons profile.

- - - - - - - - - - - - - - - - -

LANGUAGES                         (Reading)                                (Speaking)        

Italian                                       Native speaker                        Native speaker

English                                    Fluent                                      Fluent

Spanish                                   Fluent                                      Fluent

Chinese                                   -                                              Survival

 - - - - - - - - - - - - - - - - -


 2014                                        Software developer, DCPUK, Bangladesh. VSO Poverty Alleviation, remote services. Development of a software to help managing dairy cooperatives.

 2006 – 2008                            Front desk volunteer, City social services of Pavia, Italy. Helping immigrants interact with local bureaucracy. 

- - - - - - - - - - - - - - - - -



2017                Dscam1 Web Server: online prediction of Dscam1 self- and hetero-affinity

                        Marini S*, Nazzicari N*, Biscarini F, Wang GZ. Bioinformatics, in press 

                        Machine learning methods to predict Diabetes complications

                        Dagliati A, Marini S,  Sacchi  L, Cogni G, Teliti M, Decata P, Chiovato L, Bellazzi R. Journal of Diabetes Science and Technology, in press

2016                A data fusion approach to enhance association study in epilepsy

                        Marini S, Limongelli I, Rizzo E, Errichiello E, Vetro A, Tan D, Zuffardi O, Bellazzi R. Plos One, in press.

"Noisy beets": impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris

Biscarini F, Nazzicari N, Broccanello C; Stevanato P, Marini S. Plant Methods,. 2016, 12:36

2015                A Dynamic Bayesian Network model for long-term simulation of clinical complications in type 1 diabetes

                       Marini S, Trifoglio E, Barbarini N, Sambo F, Di Camillo B, Malovini A , Manfrini M, Cobelli C , Bellazzi R. Journal of Biomedical Informatics 2015, 57

                        PaPI: pseudo amino acid composition to score human coding variants

                        Limongelli I, Marini S, Bellazzi R. BMC Bioinformatics 2015, 16:123

                        Developing a parsimonius predictor for binary traits in sugar beet (Beta vulgaris)

Biscarini F, Marini S, Stevanato P, Broccanello C, Bellazzi R, Nazzicari N. Molecular Breeding 2015, 35:10

2014                Improvement of Dscam homophilic binding affinity throughout Drosophila evolution

Marini S*, Wang GZ*, Ma X, Yang Q, Zhang X, Zhu Y. BMC Evolutionary Biology 2014, 14:186 (*) equally contributed

2013                The role of SwrA, DegU and P(D3) in fla/che expression in B. subtilis.

Mordini S, Osera C, Marini S, Scavone F, Bellazzi R, Galizzi A, Calvio C. PLoS One 2013, 8:12::e85065.

2011                In silico Protein-Protein Interaction prediction with sequence alignment and classifier stacking.

                        Marini S, Xu Q, Yang Q. Curr Protein Pept Sci. 2011, 12:7



2016                Learning T2D evolving complexity from EMR and administrative data using Continuous Time Bayesian Networks

Marini S, Dagliati A, Sacchi L, Bellazzi R. 9th International Joint Conference on Biomedical Engineering System and Technolgy, HEALTHINF 2016

2015                A genomic data fusion framework to exploit rare and common variants for association discovery.

Marini S, Limongelli I, Rizzo E, Da T, Bellazzi R. 15th Conference of Artificial Intelligence in Medicine 2015

                        Matrix tri-factorization for miRNA-gene association discovery in acute myeloid leukemia

De Martini A, Marini S, Vitali F, Bellazzi R. 15th Conference of Artificial Intelligence in Medicine [Workshop] 2015



2016                Data Fusion for cleavage target prediction

Marini S, Demartini A, Vitali F, Bellazzi R, Akutsu T. Bioinformatics Italian Society National Congress 

2015                A continuous time, multivariate model to simulate Type 2 Diabetes patients trajectories

Marini S, Dagliati A, Bellazzi R. American Medical Informatics Association joint Summits on Translational Science 2015

                        Predicting Microvascular Complications from Type 2 Diabetes Retrospective Data

Sacchi L, Colombo C, Dagliati D, Marini S, Cerra C, Chiovato L, Bellazzi R. 15th Annual Diabetes Technology Meetings

2014                A multivariate data-driven model to investigate the arising of complications in T2D patients

Marini S, Malavolti M, Dagliati A, Bellazzi R. 14th Annual Diabetes Technology Meeting

                        PaPI: the Pseudo Amino acid variant Predictor

Marini S, Limongelli I, Bellazzi R.  Bioinformatics Italian Society National Congress  

                        A novel algorithm to predict the deleteriousness of genomic coding variants

                        Limongelli I, Marini S, Bellazzi R. NGS (ISCB)

Dynamic Bayesian Networks to simulate type I diabetes patients cohorts

Barbarini N, Bellazzi R, Cobelli C, Di Camillo B, Manfrini F, Malovini A, Marini S, Sambo F, Trifoglio E. Economics, Modelling and Diabetes: Mount Hood Challenge 

PaPI: using pseudo amino acid composition to predict deleterious coding variants

                        Limongelli I, Marini S, Bellazzi R. Italian Bioengineering Group National Congress

[*] denotes equal contribution.

 - - - - - - - - - - - - - - - - -


Among things I like to do in my spare time, I mention here (1) traveling; (2) playing nerdy pen-and-paper role playing games; (3) (try to) learn languages, history and philosophy.

- - - - - - - - - - - - - - - - -


I make prediction models and simulations applying several Machine Learning techniques. I work on a wide variety of data, in both Health Informatics and Bioinformatics. I exploit the hidden relations of heterogeneous data sources.