Is F a Continuous Function of R
Sometimes it isn't that useful to know the probability of an exact outcome.
If you're trying to describe the average human height to someone, for example, does it do them much good if you tell them the probability that someone is exactly 62.345 inches tall? No, they likely want to know something more along the lines of the probability that someone is taller than 5'10.
This is where continuous probability comes into use.
In this post, we're taking a look at what continuous probability is, how it relates to theoretical distributions, and a few useful R functions that go along with continuous probability.
Continuous Probability
While discrete probabilities have a range of outcomes that are countable (i.e. heads or tails, outcome of a dice, etc.), a continuous probability distribution has an infinite range of values (i.e. height, weight, time, etc.).
The cumulative distribution function (CDF) is one way that we measure continuous probability.
The way the CDF works is that it takes the proportion of the data below a for all values of a. Because values in a continuous distribution are less likely to match each other (i.e. if someone is 182.34 pounds, it's unlikely to have another person be 182.34 pounds as well), it's more helpful to look at probabilities relative for each other.
It makes more sense, for example, to look at the probability of something weighing above 180 pounds.
Theoretical Distribution
Whether you're strapped for time or don't have access to the data that you need, you may find that you need to create a probability distribution without the appropriate data to do it.
In cases like this, you can create theoretical distributions.
Theoretical distributions make certain assumptions through logical and mathematical reasoning to create a distribution that approximates reality.
Some of the more common theoretical distributions are:
- Normal distribution
- Binomial distribution
- Poisson distribution
An example of a theoretical distribution would be in the case of rolling a six-sided die. We know that there is a 1 in 6 chance of rolling each number, so we can say that our probabilities would be:
- 1/6 to roll a 1
- 1/6 to roll a 2
- 1/6 to roll a 3
- 1/6 to roll a 4
- 1/6 to roll a 5
- 1/6 to roll a 6
If we roll the die 36 times, we can estimate that we would roll each number six times. In reality, it would likely look something like:
- Rolled a 1 5 times
- Rolled a 2 7 times
- Rolled a 3 8 times
- Rolled a 4 4 times
- Rolled a 5 6 times
- Rolled a 6 6 times
However, the rolls will get closer and closer to our theoretical distribution the more times we roll the die.
There are function in R that help us create these theoretical distributions.
(RELATED: Beginner R Exercises (& Solutions) to Get Started Programming)
Important R Functions for Continuous Probability
When working with continuous probability in R, there are a few functions that will be helpful. Some of the most important are:
- pnorm
- dnorm
- qnorm
- rnorm
So, let's take a look at each.
Rnorm
Probably the easiest to understand, rnorm generates random variables.
In rnorm(), you have three arguments:
- n = number of observations
- mean = mean
- standard deviation = standard deviation
So, what that looks like in R is this:
x <- rnorm(10,10,1) hist(x)
Because I created a small sample size (10), I get a histogram that doesn't appear to approach the normal distribution.
But let's see what happens when I make the sample size 10,000:
x <- rnorm(10000,10,1) hist(x)
As you can see, you get something that looks much more like the normal distribution.
Pnorm
Using pnorm(), we can calculate the value of the cumulative density function (cdf) of the normal distribution as long we have a certain random variable, the mean of the data, and the standard deviation.
The value it returns is the area to the left of the given value in a normal distribution. To get the area to right of the given value, you can add the argument lower.tail = FALSE.
So, let's say we know the mean weight for American men is 190 pounds with a standard deviation of 20 pounds. If we want to know the probability that someone picked at random will weight less than 180 pounds, we can use the following code:
pnorm(180,196,20) 0.2118554
Given our inputs, this means there's a 21.19% chance of choosing someone who's less than 180 pounds. If we wanted to know the probability of someone being over 180 pounds, we could use the following code:
pnorm(180,196,20, lower.tail = FALSE) 0.7881446
This shows us that there is a 78.88% chance of the person you select weighing more than 180 pounds.
Qnorm
Qnorm in R calculates quantiles for you.
So, say for example that you want to find the value that marks the 75th percentile of a dataset (i.e. the point at which 75% of the data falls below that number).
If I wanted to find the value that marks the 75th percentile in a dataset with a mean of 24 and a standard deviation of 2, I would use the following code:
qnorm(.75,24,2) 25.34898
In this dataset, 75% of the values are 25.34898 or lower.
(RELATED: What is Data Visualization? A Beginner's Guide Using R)
Moving Forward With Continuous Probability
Remember, a continuous probability distribution has an infinite range of values (i.e. height, weight, time, etc.). Because it isn't that useful to know the probability of someone being an exact height or weight, for example, we use functions like pnorm and qnorm to understand a range of probabilities.
Next, I'll be diving into how to create random samples.
Source: https://thedatastudent.com/continuous-probability-a-brief-introduction-r-functions/
0 Response to "Is F a Continuous Function of R"
Post a Comment