Hello. Congratulations on enrolling at CodeClan.
Welcome to CodeClan and the introduction to your pre-course work! The aim of the pre-course work is to prepare you for the intensive 14-week Data Analysis course.
This means by the end of the next two weeks, you should:
- Be comfortable with the Mac computers.
- Have a typing speed that will help you keep pace in class.
- Understand the power of data.
- Have had a first view of the R-language which we will be using throughout the course.
- Remind yourself of any probability you might have once learnt at school
We’ll check-in with you a few times during the two weeks, but if you have a big blocker stopping you from making progress or any questions, don’t hesitate to get in touch.
Pre-Course Work Contents
The pre-course work resources are not the course materials that you will use during the 14 weeks: it is exposure, not instruction. Learn as much of the terminology as you can before you start the course, as class is very immersive and it will help you in the first few weeks. When completing the pre-course work, the following is the order that you will tackle the content in:
- Week 1
- Week 2
The first thing is to make sure you’re comfortable using a Unix-like computer. Your computer is going to be the tool of your trade, so it’s essential that getting to be comfortable using it like a coder. Challenge yourself to learn a keyboard shortcut each day.
Terminal & Command Line
You’re hopefully pretty handy already at using your computer’s Graphical User Interface or GUI. However, as coders we do prefer to use command-line access for lots of functionality – it can be a lot quicker, and it can be very powerful.
Git & Github
All the work you do as an analyst requires careful management to ensure reproducibility. We use tools and utilities to help us manage these files. This gives us a safety net of backups, to make sharing with colleagues easier and to allow us to speed up our analysis. Of the many options we have, we’re going to use a program called Git. Your Mac will have it installed.
Practicing your typing is very important. An ideal minimum typing speed by the end of the two weeks would be 40-50 words per minute with a normal typing speed. Class can be quite fast paced at times, so to ensure you can keep up practicing your typing is essential. After all, practice makes perfect!
During the course, you will use Slack. Slack is a messaging service we use at CodeClan for communication between students and staff. Whilst completing the pre-course work, we encourage you to ask your fellow classmates questions using Slack. The benefits of using Slack are:
- You get to know your cohort by discussing your interests and problems.
- You can support each other through your pre-course work.
- All of the instructional team are on Slack so using their knowledge is highly recommended.
We will send you a link to Slack just before Meet Your Cohort and give you an intro on the day.
Understand the Power of Data
Data science is a vast and varied field, however it is still a new field, with terminology still growing and changing. The pre-course work provides a summary of commonly used terms and some views on the breadth and importance of data and statistics.
Introduction to R
There are a number of languages used for data science and data analysis. We are going to introduce you to a language called R. R is an open source language, which means it is free to use and you can take it with you into any role you move into at the end of the course. We will use Swirl, a tutorial package available within R, to help you get familiar with the syntax of the language and RStudio, the tool you will be using to write your code.
Analysis and Probability Refresher
We will use the Khan Academy materials on data analysis and probability to help refresh or create a basic understanding of the following key concepts:
- In analysis, we would like you to be able to critically assess numerical data and plots of the same. Healthy skepticism is the most powerful tool you can develop as a data analyst.
- We will study statistics, and the theory of probability underpins statistics, so we would like you to have a reasonable understanding of probability by the end of the course.
The pre-course work needs to be completed and we will check-in with you a few times during the two weeks, to see how you are doing and provide help with any queries or problems.
If you have any issues, responsibilities or commitments that mean you might struggle to complete the pre-course work, you need to contact us to let us know and we can offer extra support and guidance.
Pre-course Work Resources
Below is the list of resources that you need to use to complete the pre-course work, what you need to do and when you should do it by. The pre-course work is designed to last the two weeks before the main course starts. However, we understand that everyone works at different speeds and may have different home/work responsibilities.
- Watch: How to use a Mac: Learn the Mac In Under An Hour (https://youtu.be/_7wmVxUCzs0)
- Read and watch: How to navigate the keyboard comfortably to find ‘special’ characters (Focus on: Copy, Cut and Paste, Undo and Redo, Save, Move to trash, Open spotlight tool):
- Keyboard shortcut cheat sheet (https://macmost.com/downloads/MacMostKeyboardShortcutsMojave.pdf)
- Top 10 Mac keyboard shortcuts (http://www.cultofmac.com/317935/top-10-mac-keyboard-shortcuts/)
- Mac Keyboard Shortcuts (http://www.danrodney.com/mac/)
- Watch 25 Basic Mac Keyboard Shortcuts (https://www.youtube.com/watch?v=AdMuZses96Q)
- Browse: How to search using the Spotlight tools: Using “Spotlight” (https://support.apple.com/en-gb/HT204014)
Terminal & Command Line:
- Do: You can find your Terminal window by typing “Terminal” into the search window of your mac applications
- Read: Know what UNIX is: UNIX Tutorial for Beginners: Tutorials 1, 2 and 4 are good to do. You can do the other tutorials too if you’re feeling keen but they’re not necessary for the start of the course. (http://www.ee.surrey.ac.uk/Teaching/Unix/)
- Watch: How to use the Command Line:
- Codecademy’s “Learn the Command Line course” (https://www.codecademy.com/learn/learn-the-command-line)
- David Baumgold’s “Getting to Know the Command Line” (http://www.davidbaumgold.com/tutorials/command-line/)
Understand the power of data:
- Browse: Become familiar with common data terminology: https://www.siliconrepublic.com/enterprise/what-is-data-terms-glossary
- Read: Digital literacy for digital transformation: https://digit.fyi/digital-literacy-digital-transformation-comment/
- Watch: Introduction to some course components:
Introduction to R:
- Do: Visit https://swirlstats.com/students.html You should already have R and RStudio installed on your machine, so we are starting at Step 3. At step 5, we want you to be using a course called “R Programming”. To get swirl running these are the commands you will need to type into the console window of RStudio, pressing return between each command.
swirl()Select the “R Programming” course and start with lesson 1.
Do: Complete lessons 1, 3, 4, 6 and 7.
Here’s what you’re aiming for in week 2:
- Be able to type quickly and accurately:
- Do: Normal typing speed (http://10fastfingers.com/typing-test)
- Send a screenshot of your normal typing speed by the Friday of week 2 to email@example.com.
Git and Github:
- Understand the basics of source code version control and why it is used:
- Do: Sign up and create an account on GitHub
- Read: “Git vs. GitHub” – we got you to register an account on GitHub. This website explains the differences between Git and Github. (http://jahya.net/blog/git-vs-github/)
- Do: Know how to use Git: Work through “Codecademy’s Learn Git course” (https://www.codecademy.com/learn/learn-git)
Analysis and Probability Refresher:
If you haven’t done so already, create a login for yourself on Khan Academy here.
Don’t worry if you find any of this learning tough – we will revisit the concepts introduced here throughout the course! Keep your focus more on understanding broad concepts than on following all of the mathematical detail. A detailed understanding will come when you start manipulating and analysing data for yourself.
At the end of each video, try to summarise the contents in a series of mental or written ‘bullet points’. Remember to take frequent breaks!
In total the videos and reading below should take around 5 hours to complete. The lengths of each video are included after the link so you can plan your progress.
1. Analysing categorical data
1.1 Analyzing one categorical variable
- Identifying-individuals-variables-and-categorical-variables-in-a-data-set (2:40)
- Reading-pictographs (2:20)
- Reading-bar-graphs (2:58)
1.2 Two-way tables
- Two-way-frequency-tables-and-venn-diagrams (6:22)
- Two-way-relative-frequency-tables (4:27)
- Interpreting-two-way-tables (1:43)
- Bivariate-data (3:20)
- Analyzing-trends-categorical-data (8:52)
2. Displaying and describing data
2.1 Displaying quantitative data with graphs
2.2 Describing and comparing distributions
- Shapes-of-distributions (5:06)
- Examples-analyzing-clusters-gaps-peaks-and-outliers-for-distributions (6:31)
2.3 More on data displays
2.4 Introduction to scatterplots
- Constructing-scatter-plot (2:31)
- Scatter-plot-interpreting (2:25)
- Bivariate-relationship-linearity-strength-and-direction (8:12)
3. Modelling data distributions
3.2 Density curves
- Density-curves (9:33)
3.3 Normal distributions and the empirical rule
- Qualitative-sense-of-normal-distributions (10:52)
- Normal-distribution-problems-empirical-rule (10:24)
4.1 Basic theoretical probability
- Basic-probability (8:17)
- Reading: probability-the-basics
- Simple-probability (2:55)
- Probability examples (9:55)
- Intuitive-sense-of-probabilities (8:50)
4.2 Probability using sample spaces
- Probability-counting-outcomes (2:08)
- Coin-flipping-example (2:13)
- Die-rolling (5:14)
- Describing-subsets-of-sample-spaces-exercise (5:43)
4.3 Basic set operations
4.4 Experimental probability
- Experimental-probability (6:54)
- Comparing-theoretical-to-experimental-probabilites (7:01)
- Making-predictions-with-probability (5:04)
4.5 Randomness, probability, and simulation
4.6 Addition rule
4.7 Multiplication rule for independent events