Skip to content

Training Course in Data Science, Big Data Analytics and Management with Python


About The Course


Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. This course will provide a comprehensive introduction to programming with Python, starting from the basics. Beyond confidently using Python, the training will also focus on solving problems around Data Processing and Analysis. Additionally, we will discuss for what types of problems, Python is the right choice. The overarching goal is to equip students with enough programming experience to start working in any area of computation and data-intensive research.

This comprehensive course will guide you on how to use the power of Python to analyze big data, create beautiful visualizations, and use powerful machine learning algorithms. The course is designed for both beginners with basic programming experience or experienced developers looking to make the jump to Data Science and Big Data Analysis. Python has been one of the most adaptable, and robust open-source languages that are easy to learn and uses powerful libraries for data manipulation and analysis. This Big Data Analytics with Python course provides a complete overview of data analysis techniques using Python. A Data Scientist is one of the strongest professions today and Python is a crucial skill for such a role. The Big Data Analytics with Python course teaches you to master the concepts of Python programming.


By the end of the course, participants should be able to:

  • Load data from a variety of common formats
  • Manipulate data efficiently with Pandas.
  • Use special python packages such as data visualization libraries.
  • Produce comprehensive data visualizations.
  • Apply machine learning techniques such as clustering, classification and regression.
  • Perform basic data mining.
  • Work with arrays and vectorized computation
  • Work with tabular or heterogeneous data
  • Plot and visualize data.
  • Use Python for Data Science and Machine Learning
  • Use Spark for Big Data Analysis
  • Practice techniques to manage various types of data – ordinal, categorical, encoding.
  • Master the art of performing step-by-step data analysis.
  • Use tools and techniques for predictive modelling.
  • Discuss Machine Learning algorithms and their implementation.
  • Validate Machine Learning algorithms.
  • Explain Time Series and its related concepts.
  • Perform Text Mining and Sentimental analysis.


  • Analytics Team Managers
  • Business Analysts who want to comprehend Machine Learning concepts.
  • Information Architects who want to gain proficiency in Predictive Analytics
  • Programmers, Developers, Technical Leads, Architects


Module 1: Research Design and Basic Statistical Terms and Concepts

  • Introduction to statistical concepts
  • Descriptive Statistics
  • Inferential statistics
  • Role and purpose of research design
  • Types of research designs
  • Research process
  • Practice Exercise: Identify a project of choice and develop a research design.

Module 2: Survey Planning, Implementation and Completion

  • Types of surveys
  • Survey Process and Survey design
  • Sampling Methods
  • Determining the Sample size
  • Planning a survey
  • Conducting the survey
  • Practice Exercise: Plan a survey based on the research design selected.

Module 3: Data Science Overview

  • Introduction to Data Science
  • Different Sectors Using Data Science
  • Purpose and Components of Python

Module 4: Data Analytics Overview

  • Data Analytics Process
  • Knowledge Check
  • Exploratory Data Analysis (EDA)
  • EDA-Quantitative and Graphical Techniques
  • Data Analytics Conclusion or Predictions
  • Data Analytics Communication
  • Data Types for Plotting

Module 5: Statistical Analysis and Business Applications

  • Introduction to Statistics
  • Statistical and Non-statistical Analysis
  • Major Categories of Statistics
  • Statistical Analysis Considerations
  • Population and Sample
  • Statistical Analysis Process
  • Data Distribution – Measures of Central Tendency and Dispersion
  • Correlation and Inferential Statistics

Module 6: Python Environment Setup and Essentials

  • Anaconda
  • Installation of Anaconda Python Distribution
  • Data Types with Python
  • Basic Operators and Functions

Module 7: Mathematical Computing with Python (Numpy)

  • Introduction to NumPy
  • Activity-Sequence it Right
  • Creating and Printing an nd array
  • Class and Attributes of nd array
  • Basic Operations
  • Copy and Views
  • Mathematical Functions of NumPy
  • Evaluate datasets containing GDPs of different

Module 8: Scientific Computing with Python (SciPy)

  • Introduction to SciPy
  • SciPy Sub Package – Integration and Optimization
  • SciPy Sub package
  • Demo – Calculate Eigenvalues and Eigenvector
  • Use SciPy to solve a linear algebra
  • Use SciPy to define 20 random variables for random

Module 9: Data Manipulation with Pandas

  • Introduction to Pandas
  • Understanding DataFrame
  • View and Select Data Demo
  • Missing Values
  • Data Operations
  • File Read and Write Support
  • Pandas SQL Operation
  • Analyze the Federal Aviation Authority (FAA) dataset using Pandas.

Module 10: Machine Learning with Scikit–Learn

  • Machine Learning Approach
  • Understand data sets and extract its
  • Identifying problem type and learning model
  • Train, test and optimize the
  • Supervised Learning Model Considerations
  • Scikit-Learn
  • Supervised Learning Models – Linear Regression and Logistic Regression
  • Unsupervised Learning Models
  • Pipeline
  • Model Persistence and Evaluation
  • Analyze a dataset to find the features and response

Module 11: Natural Language Processing with Scikit Learn

  • NLP Overview and NLP Applications
  • NLP Libraries-Scikit
  • Extraction Considerations
  • Scikit Learn-Model Training and Grid Search
  • Analyze a given spam collection
  • Analyze the sentiment dataset using

Module 12: Data Visualization in Python Using Matplot-Lib

  • Introduction to Data Visualization
  • Line Properties
  • Types of Plots - (x, y) Plot and Subplots
  • Analyze the “auto mpg data” and draw a pair
  • Draw a pie chart to visualize a

Module 13: Web Scraping with Beautiful Soup

  • Web Scraping and Parsing
  • Knowledge Check – Understanding and Searching the Tree
  • Navigating options
  • Demo3 Navigating a Tree
  • Knowledge Check
  • Modifying the Tree
  • Parsing and Printing the Document
  • Scrape the Simplilearn website page to perform some tasks.

Module 14: Integration With Hadoop Map-Reduce and Spark

  • Why Big Data Solutions are Provided for Python0
  • Big Data and Hadoop
  • Hadoop Core Components
  • Python Integration with HDFS using Hadoop Streaming
  • Using Hadoop Streaming for Calculating Word Count
  • Python Integration with Spark using PySpark.
  • Using PySpark to Determine Word Count
  • Determine the word count for Amazon dataset.


Foundations of Data Management in R. Your knowledge need not be extensive, but we'll assume you already know how to:

  • Create and assign variables.
  • Write programs with loops.
  • Write programs with conditions.
  • Author and use functions (methods)

Software Used

Anaconda version 5.2

Available for Windows, Linux and OS X, for 32 bit or 64-bit systems, can be downloaded here:


Participants should be reasonably proficient in English. Applicants must live up to Phoenix Center for Policy, Research and Training admission criteria.


  1. Discounts: Organizations sponsoring Four Participants will have the 5th attend Free
  2. What is catered for by the Course Fees: Fees caters for all requirements for the training – Learning materials, Lunches, Teas, Snacks and Certification. All participants will additionally cater for their travel and accommodation expenses, visa application, insurance, and other personal expenses.
  3. Certificate Awarded: Participants are awarded Certificates of Completion at the end of the training.
  4. The program content shown here is for guidance purposes only. Our continuous course improvement process may lead to changes in topics and course structure.
  5. Approval of Course: Our Programs are NITA Participating organizations can therefore claim reimbursement on fee paid in accordance with NITA Rules.

How to Book: Simply send an email to the Training Officer on and we will send you a registration form. We advise you to book early to avoid missing a seat to this training.

Or call us on: +254720272325 / +254737566961

Payment Options: We provide 3 payment options, choose one for your convenience, and kindly make payments at least 5 days before the Training start date to reserve your seat:

  1. Groups of 5 People and Above – Cheque Payments to: Phoenix Center for Policy, Research and Training Limited should be paid in advance, 5 days to the training.
  2. Invoice: We can send a bill directly to you or your company.
  3. Deposit directly into Bank Account (Account details provided upon request)

Cancellation Policy

  1. Payment for all courses includes a registration fee, which is non-refundable, and equals 15% of the total sum of the course fee.
  2. Participants may cancel attendance 14 days or more prior to the training commencement date.
  3. No refunds will be made 14 days or less to the training commencement date. However, participants who are unable to attend may opt to attend a similar training at a later date or send a substitute participant provided the participation criteria have been met.

Tailor Made Courses

This training course can also be customized for your institution upon request to a minimum of 5 participants. You can have it delivered at our Training Centre or at a convenient location.

For further inquiries, please contact us on Tel: +254720272325 / +254737566961 or Email

Accommodation: Accommodation is arranged upon request and at extra cost. For reservations contact the Training Officer on Email: or on Tel: +254720272325 / +254737566961

No comment yet, add your voice below!

Add a Comment

Your email address will not be published. Required fields are marked *

Start To Learn

10 Days


Course Duration

10 Days

Course Price

USD 2,200

Training Calendar

2024 Training Calendar

Start Date
End Date