The AIAP Field Guide (Version 4.0)

A 12 months self-directed AI/ML learning journey.

Current Contributors

AIAP Team Laurence Liew, Kevin Chng	100E Team Kevin Oh, Siavash Sakhavi, Kenny Chua
Platforms Team Najib Ninaba	Project Delivery Team Ng Kim Hock

Past Contributors

Weng Jianshu
Azmi Mohamed Ridwan
Basil Han
Cheong Wei Yih
Maurice Manning
Jeanne Choo
Daniel Ng
Ryzal Kamis

The AI Apprenticeship Programme (AIAP)™ is a 9-month programme by AI Singapore (AISG) supported by the National Research Foundation (NRF) and the Infocomm Media Development Authority (IMDA) to develop a pipeline of local AI engineers for the industry.

AIAP is a deep-skilling programme, and selection into it is rigorous. Applicants should have solid foundational skills and knowledge in AI and machine learning (ML), preferably with hands-on experience with and exposure to real-world data and production environments.

This field guide serves several needs:

For aspiring AIAP™ applicants, this field guide provides a structured 12 to 18 months learning pathway to attain the requisite level for acceptance into AIAP™.
For experienced applicants to AIAP™, this field guide provides the necessary knowledge required, and applicants can check themselves against the content outlined here before they apply for AIAP™.
Unsuccessful applicants can also evaluate their skill set against the content listed here and use this Field Guide to deepen their knowledge required before applying to AIAP again.

As AI and machine learning are developing rapidly, we will update this content based on our engineer’s experience executing the 100E projects and your feedback.

Let’s start your AI journey!

Laurence Liew

Director and Creator of the AIAP

AI Innovation, AI Singapore

The AI (or ML) engineer is a relatively recent specialisation with a skill set overlapping those of the data scientist and data engineer. An engineer, in general, builds things to solve real-world problems. An AI engineer, therefore, harnesses AI technologies to build AI systems that solve real-world problems.

To be an AI engineer requires a solid conceptual understanding of AI/ML algorithms and the requisite software engineering skills to operationalise and optimise AI systems in production. They make intelligent use of the research of data scientists (or work in close collaboration with them) and, together with data engineers, build systems that solve some business or social problem. During this process, the AI engineer builds an actual implementation that transforms a real problem into a quantitative form that delivers something of value to the organisation within ethical and regulatory boundaries. As can be seen, the AI engineering specialisation demands a wide range of skills and knowledge.

The various sections in this guide serve as your curriculum in your journey to becoming an AI Engineer. These are required competencies to work effectively as an AI Engineer.

The learning materials presented here are high quality and primarily free resources curated from the internet. They are resources some of us have used in our own AI learning journey.

It is unnecessary to complete and do each and everyone one of the resources listed here. If you already know a particular topic, feel free to skip it or browse through it quickly.

For the more complex topics, you may spend more time than we have indicated, which is totally fine. Everyone learns at a different pace and has different strengths.

This field guide aims to share with you the skills and knowledge an aspiring AI Engineer will need as he/she starts on her AI journey. Being an AI Engineer is much more than just developing models within the confines of Jupyter notebooks!

To better prepare AIAP™ aspirants for selection into the programme, AISG has charted a 12-month self-learning roadmap. It is a curation of online courses and resources which provide the essential foundational knowledge grouped into five sections.

Details

Fundamentals
AI For Everyone
Python
Software Engineering
OOP, Data Structures and Algorithms
Databases and SQL
Computational Thinking
Cloud Computing

Machine Learning
The StatQuest Illustated Guide to Machine Learning
Machine Learning Crash Course by Google
Optional: Mathematics for Machine Learning
Optional: The Data Science Manual
Optional: An Introduction to Statistical Learning (ISL)

Deep Learning
Google Cloud: Machine Learning and AI
Practical Deep Learning with Fast.ai
Dive into Deep Learning

Ethics and Governance
Model AI Governance Framework
Data Science Ethics Course

Practice
Competitions
AI Bricks

Section 1: Fundamentals

These topics are the basics we expect all candidates applying for AIAP to have. While completing every recommended resource here is not required, you should be at least familiar with them if you decide to skip them.

To set the stage and ensure everyone has the same understanding of what is AI, and what AI is not, please complete AI4I® – Literacy in AI here:

https://learn.aisingapore.org/courses/ai-for-industry-part-1/

Code is what animates computers. An AI engineer must be able to write, execute and debug code as it is the means to translate concepts into real-world actions.

While a few options are available in selecting a programming language for AI/ML development, Python remains the first choice for many developers. Python is known for its simple syntax and strong support community allowing new learners to pick it up quickly. It has a rich and extensive set of libraries. These libraries include those required for building AI/ML models. Python is also actively developed, allowing the language to improve and evolve. Finally, Python is matured and used in many large-scale IT systems and software today.

There is no shortage of online learning materials to learn Python, with many freely accessible. You can always find one that will best suit your needs, even if you have no programming experience or are already an expert in another programming language. Here, we will highlight a couple of resources for the beginner:

The Python Tutorial

This course is designed for novices to teach you the foundations to write simple programs in Python using the most common structures. By the end of this course, you’ll understand the benefits of programming in IT roles; be able to write simple programs using Python; figure out how the building blocks of programming fit together; and combine all of this knowledge to solve a complex programming problem.

Crash Course on Python by Google

This course is offered on the Coursera platform. For many courses, you can access most of the learning materials for free using the ‘audit’ mode. If you are interested in accessing the graded assignments and earning a certificate (from Google in this case), you can purchase the Certificate Experience either before or after the audit.

Youtube provides another source of excellent learning videos. Some creators of this content may have accompanying websites or code repositories that the learner can use to follow along. The channels and playlists listed below (in no particular order) are just some of the more popular ones for learning Python.

Python Tutorial for Beginners by Telusko
Python Tutorials for Absolute Beginners by CS Dojo
Python Programming Beginner Tutorials by Corey Schafer

The following learning resources will help you build vital software engineering skills an AI Engineer will need.

An AI Engineer’s toolbox: As an AI Engineer, you will be expected to build accurate, reproducible ML models and good quality, fault-tolerant and well-designed applications. You will use various tools, from code editors to IDE, command-line tools, software frameworks etc.

The Missing Semester of Your CS Education by MIT

Version control: Also known as source control, tracks and manages software code changes. Version control systems are software tools that help software teams manage changes to source code over time. As development environments have accelerated, version control systems help software teams work faster and smarter. Two options are presented here for you to choose from, both targeted at beginners:

Introduction to Git and GitHub by Google (a Coursera course)
Version Control with Git on Udacity

Software Testing and Debugging: Software that does not behave consistently or correctly when faced with different scenarios can cause significant issues when deployed. The primary purpose of testing is to detect software failures so that defects may be discovered and corrected. This course is a good overview of this topic:

Getting Started With Testing in Python by Real Python

Machine Learning (Data Science) Life Cycle: Building, training and deploying an AI model is just one small part of the whole life cycle of the machine learning process. Here are two good resources to quickly understand what a typical machine learning (or Data Science) life cycle is all about:

The Machine Learning Life Cycle Explained by DataCamp

The Machine Learning Lifecycle – DataCamp

The Team Data Science Process Lifecycle by Microsoft

The Team Data Science Process Lifecycle – Microsoft

Python is a flexible programming language that allows you to use one of many programming paradigms. OOP is one such paradigm commonly used. It is based on the concept of “objects” containing data and code. Becoming highly competent in OOP is not required at the beginning of your learning journey. However, you may come across examples and projects written in this paradigm. Here, we have presented two short tutorials covering the basic concepts of OOP in Python.

Object-Oriented Programming (OOP) in Python 3 by Real Python
Learn Object-Oriented Programming Basics in 30 Minutes: A Free Crash Course by FreeCodeCamp

A computer program is a collection of instructions to perform a specific task. For this, a computer program may need to store, retrieve, and perform computations on the data.

Data Structures are the programmatic way of storing data so that data can be used efficiently. An algorithm is a step-by-step procedure which defines a set of instructions to be executed in a particular order to get the desired output. Learning data structures and algorithms allow us to write efficient and optimised computer programs.

Intro to Data Structures and Algorithms by Google (a Udacity course)

This learning resource covers commonly used data structures found in most programming languages and basic search algorithms

A database is an organised collection of data, generally stored and accessed electronically from a computer system. With the enormous amount of available and constantly generating data, an effective, fast and reliable database system is the key that makes AI possible.

If a database is the “Housing complex” where data lives, then SQL (Structured Query Language) is like the address book that allows you to find someone quickly.

We have listed three learning resource options.

The first resource gives you a more in-depth understanding of database systems. If you are more interested in Data Engineering, this would be an excellent place to start, as you will need to interact with Databases often.

The second resource focuses on SQL, the ‘language of data’. This course is hosted by Khan Academy, a well-known, popular online learning platform that focuses on students of various levels. This could be an excellent, gentle introduction to the topic for some.

The third resource also focuses on SQL but uses the perspective of querying data for data analysis.

Database Management Essentials by University of Minnesota (Coursera)
Intro to SQL: Querying and managing data on Khan Academy
SQL for Data Analysis by Mode (Udacity)

Computational thinking (CT) is a set of problem-solving methods involving expressing problems and their solutions in ways that a computer could execute. It involves the mental skills and practices for designing computations that get computers to do jobs for people and explaining and interpreting the world as a complex of information processes.

Computational thinking is critical if you want to become a good AI Engineer, as you will be asked to solve problems that do not have obvious solutions.

Computational Thinking for Problem-Solving by University of Pennsylvania (Coursera)

Cloud computing delivers on-demand computing services — from applications to storage and processing power — typically over the internet and on a pay-as-you-go basis. Rather than owning their computing infrastructure or data centres, companies can rent access to anything from applications to storage from a cloud service provider.

Some of the more well-known cloud service providers are:

Gartner’s report describes cloud computing’s state as of 2020 if you are interested in understanding the landscape.

For this Field Guide, we will focus on Google Cloud as it is user-friendly and has many learning resources. Alternatively, you may want to learn about the provider used by your organisation instead.

To begin learning, you must create an account on Google Cloud. To encourage new users, Google gives USD300 free credits, which allows you to explore the platform. In addition, you can use more than 20 products and services for free (up to a monthly limit).

You can learn about Google Cloud from its various documentation (shown above). However, we recommend the following resources if you prefer a more structured learning experience.

Google Cloud Computing Foundations

This is a set of four courses that will give an overview of concepts important to Cloud Computing and how Google Cloud fits in.

https://go.qwiklabs.com/gwg

Section 2: Machine Learning

In this section, we begin our journey with Machine Learning, formally known as Statistical Learning. Machine Learning is a subset of Artificial Intelligence (AI).

There are several branches of AI, as described in the authoritative and most-used AI textbook – Artificial Intelligence: A Modern Approach (https://aima.cs.berkeley.edu/). However, this field guide will only focus on machine learning and deep learning – the two most popular ways to build AI systems today.

I would like to thank Josh Starmer for allowing us to link his book and videos. We highly recommend the StatQuest book and videos for professionals keen to understand more about machine learning, whether you intend to pursue AIAP or otherwise.

The StatQuest Illustrated Guide to Machine Learning covers the following topics:

Fundamental concepts in Machine Learning and Statistics

Cross Validation
Linear Regression
Logistic Regression
Gradient Descent
Naive Bayes
Confusion Matrices
Regularization
Decision Trees
Support Vector Machines
Neural Networks

You can buy the book here:

Physical or Kindle: https://www.amazon.com/dp/B0BLM4TLPY
PDF: https://statquest.gumroad.com/l/wvtmc?layout=profile

For EPOCH members, please remember to use your 10% discount code below to purchase the PDF!

The StatQuest Youtube channel is a rich source of easy-to-understand videos on the topics covered in the book:

https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw

Developed by Google, this free online course features a series of lessons with video lectures, real-world case studies, and hands-on practice exercises. The course does not presume or require any prior knowledge of machine learning. However, to understand the concepts presented and complete the exercises, the learning and preparation you have done up to this point will be very useful.

Machine Learning Crash Course by Google

The following resources are highly recommended but optional as they can be more challenging than the previous resources in this section. You should complete the above two and come back to these when you have time.

Mathematics is needed to understand how various Machine Learning algorithms truly work. Fortunately, you do not need advanced mathematics skills to do so. Use these resources to brush up on key concepts such as Linear Algebra and Calculus. Additionally, these resources have been developed specifically with Machine Learning in mind rather than being a generic course in Mathematics.

The first resource is similar to previous recommendations in that it is an online course. The second resource is an alternative where the authors, who are respected academics in their field, have made their books available for free. You can download a PDF copy from their website containing additional resources to support your learning.

Mathematics for Machine Learning Specialization by Imperial College London (Coursera). You should focus on the first two courses.
Mathematics for Machine Learning free eBook

Table of Contents

Part I: Mathematical Foundations

Introduction and Motivation
Linear Algebra
Analytic Geometry
Matrix Decompositions
Vector Calculus
Probability and Distribution
Continuous Optimization

Part II: Central Machine Learning Problems

When Models Meet Data
Linear Regression
Dimensionality Reduction with Principal Component Analysis
Density Estimation with Gaussian Mixture Models
Classification with Support Vector Machines

An excellent introduction and foundation course in Data Science and Machine Learning by Professor Steven Skiena, the Distinguished Teaching Professor of Computer Science at Stony Brook University. His course and book The Data Science Design Manual provide an excellent introduction with exciting war stories. Additional resources, including data sets for projects and assignments, can be found on the book’s website.

With the kind permission of Prof Skiena, AI Singapore has mounted his lectures as a course module here.

Table of Content:

What is Data Science
Mathematical Preliminaries
Data Munging
Scores and Ranking
Statistical Analysis
Visualizing Data
Mathematical Models
Linear Algebra
Linear and Logistic Regression
Distance and Network Methods
Machine Learning
Big Data

Written by well-respected academics, this is one of the most well-known books in Machine Learning. It focuses on explaining algorithms and techniques which are statistically based. Like the Mathematics learning resource earlier, the authors have made their book available for free download. In addition, a video recording of a course taught at Stanford by two of the authors (together with various course materials) is also available here.

Note that the book uses R. However, the equivalent Python notebooks kindly contributed to the community by Jordi Warmenhoven can be found here.

Table of Content:

Introduction
Statistical Learning
Linear Regression
Classification
Resampling Methods
Linear Model Selection and Regularization
Moving Beyond Linearity
Tree-based Methods
Support Vector Machines
Unsupervised Learning

Section 3: Deep Learning

TensorFlow and PyTorch are two popular AI frameworks, and as AI Engineers, you will probably use one at some point in your career. Hence we have resources here which use either TensorFlow and/or PyTorch recommendations here.

This learning path from Google consists of 5 courses which you can take on Coursera, and 21 labs on the Qwiklabs platform. This path allows you to go deeper into the Google products, such as data storage, pipelines, computes etc. that are typically used for building AI/ML products.

Google Cloud: Machine learning and AI

A highly rated, popular, practical course to get started quickly with PyTorch and AI applications.

https://course.fast.ai/

There are nine lessons, each around 90 minutes long. The course is based on the fast.ai 5-star rated book, which is also freely available online.

Lessons:

This unique resource is delivered in a single medium combining code, math and HTML.

Book: https://d2l.ai/index.html
Notebooks: https://github.com/d2l-ai/d2l-en

Section 4: Ethics and Governance

AI must be built and used ethically, fairly and responsibly. An AI engineer should be aware of these principles.

Singapore released the first edition of the Model AI Governance Framework in 2019 to guide on key ethical and governance issues when deploying AI solutions. The second and latest edition was released on 21 January, 2020.

Enrol in an online course to learn about data privacy issues, algorithmic bias, and fairness.

We recommend the Practical Data Ethics course from the same people that brought you fast.ai:

https://ethics.fast.ai/

Microsoft: AI Principles & Approach from Microsoft
Microsoft: FATE: Fairness, Accountability, Transparency, and Ethics in AI
Microsoft: Responsible bots: 10 guidelines for developers of conversational AI
Microsoft: The Future Computed (eBook)
Google: Responsible AI Practices
DeepMind: Safety & Ethics
Interpretable Machine Learning
Fairness and machine learning

Section 5: Practice

Put learning into practice by building an actual AI project. This can be through joining a competition platform or, even better, building your own real-world application.

Kaggle

Kaggle is the industry’s most popular AI/ML competition and has been a starting point for many practitioners. It also allows you to discuss with, learn from, and benchmark against other aspiring and experienced AI engineers and data scientists.

AI Singapore’s AI Bricks platform contains curated tools and resources for you to solve your own real-world AI problems. As of June 2021, we have the following collections:

Additional Resources

To supplement the journey, the following materials are also recommended. Note that some of the books do get updated regularly.

Python Crash Course, A Hands-On, Project-Based Introduction to Programming by Eric Matthes
Learning Python by Mark Lutz
Fluent Python: Clear, Concise, and Effective Programming by Luciano Ramalho
Learn Python the Hard Way by Zed Shaw, New York: Addison-Wesley (https://learnpythonthehardway.org/)

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
Pandas for Everyone: Python Data Analysis by D Chen D (2017). New York: Addison-Wesley.
Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas

The Hundred-Page Machine Learning Book by Andriy Burkov
The Hundred-Page Machine Learning Engineering Book by Andriy Burkov
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron
Deep Learning with Python by Francois Chollet
Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence by Jon Krohn , Beyleveld Grant
Deep Learning Book at http://www.deeplearningbook.org/ (Advanced)

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications by Chip Huyen
Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Lakshmanan, Valliappa
Software Engineering at Google: Lessons Learned from Programming Over Time by Winters, Titus

Artificial Intelligence: A Modern Approach by Stuart Russell

A Thousand Brains: A New Theory of Intelligence by Jeff Hawkins
On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines by Jeff Hawkins
Weapons of Math Destruction by Cathy O’Neil
Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford
Human Compatible by Stuart Russell
Architects of Intelligence by Martin Ford
Rebooting AI – Building Artificial Intelligence We Can Trust by Gary Marcus, Ernest Davis
Automating Inequality by Virgina Eubanks

	TagUI is our open-source, full-featured desktop RPA tool. It helps you automate your repetitive tasks, such as data acquisition and testing of web apps.
	Speech Lab‘s speech recognition enables you to convert audio to text. This is our uniquely developed code-switching speech engine which can recognise English, Mandarin and Singlish
	Fine Pose Social distancing app that utilises human pose estimation.
	CUDO (Collaborative Urban Delivery Optimisation) is our resource planning and scheduling tool for logistics service providers.
	Computer Vision Hub Our open source tools for Computer Vision
	AI-Ready Bricks Plug-and-play tools built on machine learning (ML) platforms.
	Natural Language Processing Hub Our open-source tools for Natural Language Processing
	Synergos Our open-source platform for Federated Learning

AIAP Field Guide