Course structure
Introduction to Big Data and Hadoop
- big data and Hadoop
- Hadoop ecosystem
SAS Data Loader Overview
- SAS Data Loader capabilities and architecture
- SAS Data Loader directives and tasks
- steps common to most directives
- preparing data for analysis and reporting
- course overview and logistics
Acquiring and Discovering Data
- introduction to acquiring and discovering data
- copying a table into Hadoop
- importing a delimited file into Hadoop
- profiling data for inconsistencies
- querying data for relevant columns and rows
Transforming and Transposing Data
- introduction to transforming and transposing data
- transforming data to be fit-for-purpose
- transposing data for use in analysis and reporting
Cleansing Data
- introduction to cleansing data
- parsing data into meaningful subsets
- standardizing data into consistent formats
- using match codes to determine data similarity
- using names to identify gender
- analyzing data for data types
- applying casing for data consistency
- extracting data in useful tokens
- analyzing data for inconsistent patterns
Integrating Data
- introduction to integrating data
- joining data in Hadoop
- sorting and de-duplicating data
- clustering and surviving data to determine a best record
- matching and merging data into a single table
- deleting rows in Hadoop tables
- running user-written programs inside Hadoop
Delivering Data
- introduction to delivering data from Hadoop
- loading data to the SAS LASR Analytic Server for analysis and reporting
- copying Hadoop data to SAS and relational database tables
Managing and Integrating Directives
- introduction to managing and integrating directives
- creating data flows by chaining directives
- integrating directives into SAS platform applications
- running directives as batch jobs
Additional Topics
- SAS and Hadoop data processing
- SAS DS2 programs
- debugging Hadoop jobs