Probability of finding specific set of coloured balls within larger set of random-drawn balls
up vote
4
down vote
favorite
In this question I was helped with calculating the probability of drawing specific set of M coloured balls from a set of N coloured balls.
Now I am looking for a solution for an extended problem: what is the probability of finding my specific set of M balls among Q balls (where Q > M), drawn from the same pool of N balls.
For example: there are 100 balls in the box: 50 red, 30 blue and 20 white. I randomly draw (without replacement) 15 balls from the box. What is the probability of finding among them a set of 6 balls, where 2 of them are red, 2 blue and 2 white?
Using the variables:
N = 100: n1 = 50, n2 = 30, n3 = 20
M = 6: m1 = 2, m2 = 2, m3 = 2
Q = 15
While there are methods how to calculate the result for any specific input values, I am looking for a universal algorithm that would give result for any input values. Right now, I find it hard how to make a jump from the simpler problem of drawing exactly M balls: while it feels there should be some kind of multiplier that corrects for the extra freedom of drawing additional Q-M balls, this surely is not simple one, because these extra balls can themselves contain the set M, or there is partial overlapping, so there should be some smart way how to deduct all those duplicate combinations.
Edit: complemented with empirical data from virtual experiment.
I went forward to build a program to perform virtual experiments. Of course, computer random generators are not true random, but that's the best I have and for the sake of this experiment, I think, they are random enough.
So, I created the pool: a set of 2 red, 3 blue and 4 white balls. My sample size is 6 balls, and I consider the sampling a success, if it contains (at least) 1 red, 1 blue and 1 white ball. Running the simulation one million times I get probability of 90.4858%. That does not necessarily exactly matches the calculated probability (whatever it is), but should be pretty close. So, I'd expect that the formula gives such result, given the input parameters.
combinatorics
add a comment |
up vote
4
down vote
favorite
In this question I was helped with calculating the probability of drawing specific set of M coloured balls from a set of N coloured balls.
Now I am looking for a solution for an extended problem: what is the probability of finding my specific set of M balls among Q balls (where Q > M), drawn from the same pool of N balls.
For example: there are 100 balls in the box: 50 red, 30 blue and 20 white. I randomly draw (without replacement) 15 balls from the box. What is the probability of finding among them a set of 6 balls, where 2 of them are red, 2 blue and 2 white?
Using the variables:
N = 100: n1 = 50, n2 = 30, n3 = 20
M = 6: m1 = 2, m2 = 2, m3 = 2
Q = 15
While there are methods how to calculate the result for any specific input values, I am looking for a universal algorithm that would give result for any input values. Right now, I find it hard how to make a jump from the simpler problem of drawing exactly M balls: while it feels there should be some kind of multiplier that corrects for the extra freedom of drawing additional Q-M balls, this surely is not simple one, because these extra balls can themselves contain the set M, or there is partial overlapping, so there should be some smart way how to deduct all those duplicate combinations.
Edit: complemented with empirical data from virtual experiment.
I went forward to build a program to perform virtual experiments. Of course, computer random generators are not true random, but that's the best I have and for the sake of this experiment, I think, they are random enough.
So, I created the pool: a set of 2 red, 3 blue and 4 white balls. My sample size is 6 balls, and I consider the sampling a success, if it contains (at least) 1 red, 1 blue and 1 white ball. Running the simulation one million times I get probability of 90.4858%. That does not necessarily exactly matches the calculated probability (whatever it is), but should be pretty close. So, I'd expect that the formula gives such result, given the input parameters.
combinatorics
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
In this question I was helped with calculating the probability of drawing specific set of M coloured balls from a set of N coloured balls.
Now I am looking for a solution for an extended problem: what is the probability of finding my specific set of M balls among Q balls (where Q > M), drawn from the same pool of N balls.
For example: there are 100 balls in the box: 50 red, 30 blue and 20 white. I randomly draw (without replacement) 15 balls from the box. What is the probability of finding among them a set of 6 balls, where 2 of them are red, 2 blue and 2 white?
Using the variables:
N = 100: n1 = 50, n2 = 30, n3 = 20
M = 6: m1 = 2, m2 = 2, m3 = 2
Q = 15
While there are methods how to calculate the result for any specific input values, I am looking for a universal algorithm that would give result for any input values. Right now, I find it hard how to make a jump from the simpler problem of drawing exactly M balls: while it feels there should be some kind of multiplier that corrects for the extra freedom of drawing additional Q-M balls, this surely is not simple one, because these extra balls can themselves contain the set M, or there is partial overlapping, so there should be some smart way how to deduct all those duplicate combinations.
Edit: complemented with empirical data from virtual experiment.
I went forward to build a program to perform virtual experiments. Of course, computer random generators are not true random, but that's the best I have and for the sake of this experiment, I think, they are random enough.
So, I created the pool: a set of 2 red, 3 blue and 4 white balls. My sample size is 6 balls, and I consider the sampling a success, if it contains (at least) 1 red, 1 blue and 1 white ball. Running the simulation one million times I get probability of 90.4858%. That does not necessarily exactly matches the calculated probability (whatever it is), but should be pretty close. So, I'd expect that the formula gives such result, given the input parameters.
combinatorics
In this question I was helped with calculating the probability of drawing specific set of M coloured balls from a set of N coloured balls.
Now I am looking for a solution for an extended problem: what is the probability of finding my specific set of M balls among Q balls (where Q > M), drawn from the same pool of N balls.
For example: there are 100 balls in the box: 50 red, 30 blue and 20 white. I randomly draw (without replacement) 15 balls from the box. What is the probability of finding among them a set of 6 balls, where 2 of them are red, 2 blue and 2 white?
Using the variables:
N = 100: n1 = 50, n2 = 30, n3 = 20
M = 6: m1 = 2, m2 = 2, m3 = 2
Q = 15
While there are methods how to calculate the result for any specific input values, I am looking for a universal algorithm that would give result for any input values. Right now, I find it hard how to make a jump from the simpler problem of drawing exactly M balls: while it feels there should be some kind of multiplier that corrects for the extra freedom of drawing additional Q-M balls, this surely is not simple one, because these extra balls can themselves contain the set M, or there is partial overlapping, so there should be some smart way how to deduct all those duplicate combinations.
Edit: complemented with empirical data from virtual experiment.
I went forward to build a program to perform virtual experiments. Of course, computer random generators are not true random, but that's the best I have and for the sake of this experiment, I think, they are random enough.
So, I created the pool: a set of 2 red, 3 blue and 4 white balls. My sample size is 6 balls, and I consider the sampling a success, if it contains (at least) 1 red, 1 blue and 1 white ball. Running the simulation one million times I get probability of 90.4858%. That does not necessarily exactly matches the calculated probability (whatever it is), but should be pretty close. So, I'd expect that the formula gives such result, given the input parameters.
combinatorics
combinatorics
edited Apr 13 '17 at 12:20
Community♦
1
1
asked Mar 10 '13 at 19:45
Passiday
1406
1406
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
What you need is called hypergeometric probability. There are totally $binom{100}{15}$ of selecting 15 balls out of 100. You want 6 balls divided into three subsets with specific (color) properties and the rest (15-6) you don't care what properties they have. Since you sample without replacement and experiments are independent you get
$$
P(X)= frac{binom{50}{2} cdot binom{30}{2} cdot binom{20}{2} cdotbinom{100-6}{15-6}}{binom{100}{15}}
$$
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
What you need is called hypergeometric probability. There are totally $binom{100}{15}$ of selecting 15 balls out of 100. You want 6 balls divided into three subsets with specific (color) properties and the rest (15-6) you don't care what properties they have. Since you sample without replacement and experiments are independent you get
$$
P(X)= frac{binom{50}{2} cdot binom{30}{2} cdot binom{20}{2} cdotbinom{100-6}{15-6}}{binom{100}{15}}
$$
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
add a comment |
up vote
0
down vote
What you need is called hypergeometric probability. There are totally $binom{100}{15}$ of selecting 15 balls out of 100. You want 6 balls divided into three subsets with specific (color) properties and the rest (15-6) you don't care what properties they have. Since you sample without replacement and experiments are independent you get
$$
P(X)= frac{binom{50}{2} cdot binom{30}{2} cdot binom{20}{2} cdotbinom{100-6}{15-6}}{binom{100}{15}}
$$
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
add a comment |
up vote
0
down vote
up vote
0
down vote
What you need is called hypergeometric probability. There are totally $binom{100}{15}$ of selecting 15 balls out of 100. You want 6 balls divided into three subsets with specific (color) properties and the rest (15-6) you don't care what properties they have. Since you sample without replacement and experiments are independent you get
$$
P(X)= frac{binom{50}{2} cdot binom{30}{2} cdot binom{20}{2} cdotbinom{100-6}{15-6}}{binom{100}{15}}
$$
What you need is called hypergeometric probability. There are totally $binom{100}{15}$ of selecting 15 balls out of 100. You want 6 balls divided into three subsets with specific (color) properties and the rest (15-6) you don't care what properties they have. Since you sample without replacement and experiments are independent you get
$$
P(X)= frac{binom{50}{2} cdot binom{30}{2} cdot binom{20}{2} cdotbinom{100-6}{15-6}}{binom{100}{15}}
$$
answered Mar 10 '13 at 19:53
Alex
14.2k42133
14.2k42133
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
add a comment |
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
Is it really so simple that the multiplier is the binomial coefficient (N-M, Q-M)? I was worried that this would result in several distinct combinations of Q balls registered more than once: think about it: if we label all the 100 balls with id numbers, we can imagine set of 12 balls: RED1, RED2, BLUE3, BLUE4, WHITE5, WHITE6, RED7, RED8, BLUE9, BLUE10, WHITE11 and WHITE12. This is one unique combination of 12 drawn balls, but it contains actually several distinct sets of 2/2/2 balls. Shouldn't that cause the number in numerator decrease?
– Passiday
Mar 10 '13 at 20:58
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
If I understood your question correctly, you want two balls from each of 3 subsets (white, blue, red) and you 'don't care' of the rest. Since there are no other colors in the box, you have only these three options for the 'don't care' subset in the sample
– Alex
Mar 10 '13 at 21:02
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
I'll try to say it in other words. The denumerator describes the total number of unique sets of 15 balls, whatever colour, drawn from the pool of 100 balls. The numerator should describe the total number of unique sets of 15 balls that contain my preferred set of 6 balls. Isn't it that identical sets (R1, R2)(B3, B4)(W5, W6)(R7, R8, B9, B10, W11, W12, R13, R14, R15) and (R7, R8)(B9, B10)(W11, W12)(R1, R2, B3, B4, W5, W6, R13, R14, R15) will be counted as distinct?
– Passiday
Mar 10 '13 at 21:36
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
Actually, if my calculation is correct, the formula in your example gives absurd probability of 425.1 ie, that means that in the numerator each unique sample is registered on average at least 425 times.
– Passiday
Mar 16 '13 at 22:44
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
You are right. Probably my mistake is that the remaining 9 balls are also selected from the set of R, W and B. In such case I'm not sure how to interpret the problem-there are no other sets these 9 balls can come from
– Alex
Mar 17 '13 at 0:40
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f326770%2fprobability-of-finding-specific-set-of-coloured-balls-within-larger-set-of-rando%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown