Soledad Galli

Data Science Coach

Talk Abstract: We use machine learning algorithms to determine patterns in past data and then predict behaviour in future observations. However, the data available in business is generally not ready for use in machine learning modelling. On the contrary, we typically devote an extensive amount of time in pre-processing and selecting the variables that we will finally feed into our models. What are the typical problems we find in data? And what are the advantages of selecting variables to build models for in business? In this talk, I will describe common data issues for numerical and categorical variables, highlighting which machine learning models are susceptible. I will introduce and compare different feature engineering techniques for imputation of missing data, processing of outliers and encoding of categorical variables. I will continue with an overview of different feature selection procedures, focusing on the limitations and advantages of each technique. By the end of the talk, I hope to give you a flavour of variable preprocessing and selection for building business models that can be put in production.

Audience level: Intermediate

Bio:  Soledad is a Lead Data Scientist at LV=, with 2+ years of experience in data science and analytics in the financial sector, and 10+ years of experience in scientific research in academia. She is passionate about extracting meaningful information from data and supporting institutions make solid and reliable data driven decisions. At LV=, Soledad and the data science team are leading the implementation of machine learning across the multiple company business areas. Having transitioned from academia to data science, Soledad is passionate about enabling and facilitating data scientists and academics transition into the field, and helping data scientists increase their breath of knowledge. During the last year, Soledad shared insight in blogs and talks in the data science community. She also created 2 online courses on machine learning, “Feature Engineering” and “Feature Selection” for machine learning, which are now live in Udemy. The courses have enrolled 400+ students from several parts of the world in just under 3 months, and received very good reviews from the students.

If you are interested in learning more about feature engineering and feature selection for machine learning, you can take Soledad’s courses on Udemy at a super discounted price using the voucher DSCOACH2018.


Saturday April 21st , 2018
9:00 am-
5:00 pm
Data Science Festival Mainstage (Ballot ticket only) CodeNode - 10 South Pl, London EC2M 7EB BALLOT TICKETS ARE NOW OPEN Get Tickets Due to the popularity of Data Science Festival events, we are now allocating event tickets via a random ballot. Registering here enters you into the ticket ballot for…