Course structure
Upon successful completion of this course, participants should be able to:
Data Warehousing with SQL and NoSQL
- Provide an overview of data warehouses
- Explain the purposes of databases and their various types
- Describe various SQL and NoSQL tools
ETL Offload with Hadoop and Spark
- Identify business challenges with ETL (Extract-Transform-Load)
- Explain ELT and ETL processes
- Describe the Hadoop ecosystem as an ETL offload solution
Data Governance, Security and Privacy for Big Data
- Describe data governance, roles, and responsibilities
- Discuss data governance models
- Describe metadata, metadata types and uses
- Explain master data, framework, and purpose
- Explain Hadoop security controls
- Discuss data governance tools Apache Atlas, Ranger and Knox
- Describe cloud security consideration
- Explain GDPR and data ethics
Processing Streaming and IoT Data
- Describe streaming and IoT data environments
- Explain Kafka messaging system with examples
- Explain the key features, architecture and various use cases of stream processing tools such as Storm, Spark Streaming, and Flink
- Explain various IoT related projects such as Project Nautilus, Pravega, and EdgeX Foundry
Building Data Pipelines with Python
- Write Python scripts to perform key data processing activities
- Describe data pipelines and tools
- Build data pipelines using Python