Basic statistics with Python
"It's easy to lie with statistics. It's hard to tell the truth without it."
— Andrejs Dunkels, Latvian-Swedish mathematician
The aim of this course is to teach you how to perform basic statistical analysis using Python. First we review the foundations (sampling theory, discrete and continuous distributions), then we focus on classical hypothesis testing. This course will improve your generic statistics knowledge. We cannot go into the specific data analysis problems of your particular project.
Instructor: András Aszódi.
Topics
This course teaches the same statistical concepts as the Basic statistics with R training but uses the Python programming language.
- Sampling theory: obtaining information about a population via sampling. Sample characteristics (location, dispersion, skewness), estimation of the mean, standard error of the mean.
- Discrete and continuous probability distributions. Central limit theorem.
- Hypothesis testing. Basic principles, one- and two-sided testing, types of errors, power calculations.
- "Cookbook of tests": location testing, normality, variance comparisons, counting statistics, contingency tables, correlation tests.
Out of scope
This course will not teach you bioinformatics. In particular, no high-throughput sequencing data will be used because they are impractically large, and not everyone on campus is working with sequencing.
If you are interested in the statistical background of gene expression analysis with high-throughput sequencing, then please take our RNA-Seq data analysis course.
Prerequisites
Python-3 knowledge is required. In particular the following skills are necessary:
- Using the Python interpreter, either the command-line program or in a Jupyter notebook
- Familiarity with standard Python data structures
- Familiarity with Python's object-oriented features
If you have attended our Python programming training then you are well equipped to take this course. We will use NumPy, SciPy, Pandas and MatPlotLib, the necessary details to use these packages will be explained during the course.
Practical information
Number of participants: minimum 5, maximum 10.
Length: The course takes two half-days, from 09:00 to 13:00 with 2 breaks.