Tag Archive for: HPC

Probability Theory with R


Probability Theory with R

The main aim of this course is to teach you the basics of probability theory
which is indispensable for any scientific investigation. We will focus on the
intuition behind probability theory rather than a mathematically rigorous
presentation. Yes, there will be equations, but they will be kept to a minimum.

Instructor: András Aszódi.

Topics

  • Foundations of probability theory: basic identities (sum rule, product rule).
    Indepentent variates, conditional probability.
  • Discrete probability distributions: Uniform, Bernoulli, Binomial, Poisson, Negative Binomial.
  • Continuouos probability distributions: Uniform, Exponential, Gamma etc.
  • Central Limit Theorem and the Normal distribution.

Prerequisites

Basic familiarity with R is required. In particular the following skills are necessary:

  • Using the R interpreter, either the command-line program or in R Studio
  • How to invoke R functions, pass optional/named parameters
  • Some familiarity with simple plotting commands

If you have attended our R as a programming language training
then you are well equipped to take this course.

Practical information

Number of participants: minimum 5, maximum 10.

Length: The course takes one half-day,
from 09:00 to 13:00 with 2 breaks.

Think Statistics with R


Think Statistics with R

“It’s easy to lie with statistics. It’s hard to tell the truth without it.”
— Andrejs Dunkels, Latvian-Swedish mathematician

The main aim of this course is to teach you to how to approach data analysis problems
with classical statistics
. We focus on the intuition behind statistical methodologies
rather than on “how to run a t-test with R” (which we will also learn, by the way).

First we review the foundations (sampling theory and parameter estimation),
then we continue with hypothesis testing. The technology itself is introduced using “Student”‘s t-test
as an example, with a strong emphasis on errors (false positives, p-value distributions,
test power calculations). Finally a short “cookbook of tests” is offered.

Instructor: András Aszódi.

Topics

This course teaches the same statistical concepts as the Basic statistics with Python
training but uses the R programming language.

  • Sampling theory: obtaining information about a population via sampling.
    Sample characteristics (location, dispersion, skewness).
  • Central Limit Theorem and the Normal distribution (refresher).
  • The distribution of the sample mean. Confidence intervals.
  • Basic principles of hypothesis testing. “Student”‘s t-test.
  • Type I and Type II errors. P-value distributions. Power calculations.
  • “Cookbook of tests”: distribution tests, parametric and non-parametric tests,
    counting statistics, contingency tables, correlation tests.

Exercises

Online exercises are available when this course is running. Please select
the option “R stats” from the dropdown in the “Request an exercise notebook” form.

Out of scope

We cannot go into the specific data analysis problems of your particular project.

Furthermore, this course will not teach you bioinformatics.
In particular, no high-throughput sequencing data will be used because they are impractically large,
and not everyone on campus is working with sequencing.

Prerequisites

Scientific prerequisite: probability theory basics

You are required to be familiar with the following:

  • Basic probability concepts: independence, sum rule, multiplication rule.
  • Discrete probability distributions: Uniform, Bernoulli, Binomial, Poisson.
  • Continuouos probability distributions: Uniform, Exponential, Gamma etc.

Participation in our Probability Theory with R course
is strongly recommended before taking this course.

Technical prerequisite: familiarity with R

The following skills are necessary:

  • Using the R interpreter, either the command-line program or in R Studio
  • How to invoke R functions, pass optional/named parameters
  • Some familiarity with simple plotting commands

If you have attended our R as a programming language training
then you are well equipped to take this course.

“Bring Your Own Data”

You can bring your own data to this course and run
a “Student”‘s t-test on it.

The data set

Please prepare
a comma-separated-values (CSV) file with UNIX line endings (n) that
consists of two columns corresponding to the two groups of data. You can do this easily
with Excel.
The first row shall contain the group labels.
The size of the two groups need not be the same.
Save the CSV file to the laptop that you will bring to the course.

Data confidentiality

The training VM is protected by a firewall and other security measures.
Your training account together with all data will be deleted immediately after the course.

Practical information

Number of participants: minimum 5, maximum 10.

Length: The course takes two half-days,
from 09:00 to 13:00 with 2 breaks.