This bootcamp is designed to introduce you to data science and machine learning using the Python programming language. Through this intense, weeklong program you will begin your mastery of the skills necessary to manipulate, visualize, explore, and apply machine learning to datasets to extract valuable insights.
This course is taught by Ted Petrou, an expert at data exploration and machine learning using Python. He is the author of Pandas Cookbook, a thorough step-by-step guide to accomplish a variety of data analysis tasks with Pandas. He is ranked in the top .1% of Stack Overflow users of all time. He is the author of the Python data exploration libraries Dexplo and Dexplot.
Small Class Size
This is a small class with at most 15 participants, which will allow everyone to fully participate and ask questions that will get answered quickly.
Live Class Dates
Nov 10-11, 17-18: Live classes held on the weekends near Times Square from 9 a.m. - 5 p.m.
Upon registration, students will receive a very thorough, 200-page precourse assignment on the fundamentals of Python. There are over 100 exercises and a final project where a poker game is built with a rules-based artificial intelligence.
Nov 5th and 7th: Two online classes held from 7 - 10 p.m. Students will receive an assignment introducing them to the Pandas library and in particular how to select subsets of data. Over 100 pages and 100 exercises are available.
Live Classes Syllabus
Over 1,000 pages of material, 400 questions with detailed solutions, several mini-projects, and a few major case studies will be available. During the week, students will work on completing two of these case studies.
Day 1: Minimally Sufficient Pandas
The Pandas library is powerful yet confusing as there are always multiple operations to complete the same task. Students will learn a small yet powerful subset of Pandas that will allow them to complete many tasks without getting distracted by syntax. Students will also learn an efficient process for building a workflow in a Jupyter Notebook.
Day 2: Split-Apply-Combine
Insights within datasets are often hidden amongst different groupings. The split-apply-combine paradigm is the fundamental procedure to explore differences amongst distinct groups within datasets.
During the week: Tidy Data
Real-world data is messy and not immediately available for aggregation, visualization or machine learning. Identifying messy data and transforming it into tidy data provides a structure to data for making further analysis easier.
Day 3: Exploratory Data Analysis
Exploratory data analysis is a process to gain understanding and intuition about datasets. Visualizations are the foundations of EDA and communicate the discoveries within. Matplotlib, the workhorse for building visualizations will be covered, followed by pandas effortless interface to it. Finally, the Seaborn library, which works directly with tidy data, will be used to create effortless and elegant visualizations.
Day 4: Applied Machine Learning
After tidying, exploring, and visualizing data, machine learning models can be applied to gain deeper insights into the data. Workflows for preparing, modeling, validating and predicting data with Python's powerful machine learning library Scikit-Learn will be built. The very latest additions to Scikit-Learn have been incorporated into this material. See this blog post from Ted for more info.
Students will complete three end-to-end data analyses which will be personally reviewed by Ted. Students will learn how to create interactive dashboards and present their results with them.
Learning is accomplished by working through difficult assignments and receiving and reviewing modeled solutions. Using a 'flipped classroom', students will prepare and read each day's material before coming to class. In class, students will rotate from instructor guided lessons to student-focused exercises and projects. Ted is a very active participant and will sit alongside students as they complete the exercises.