Genetics Workshop Number Three
by Dr Jamie Love
2002 - 2010
|
Welcome to your third Genetics Workshop. Here we will go slowly and systematically through the steps needed to do chi-square analysis. This Workshop is divided into three different problems and a summary of general ideas. The first one is an analysis of the results from a monohybrid cross and I will walk you though it very slowly. We will go over the fundamentals of chi-square work and reinforce what you learn in the lesson. The second chi-square problem is another monohybrid problem. I will move a little faster while showing you how to organize your math and that should give you a better feel for how to actually do a chi-square. The last chi-square involves a dihybrid cross so it is a little bit harder but it uses the same ideas as a simpler chi-square problem and you will see that it helps to set up a table in order to keep track of the numbers. We will conclude this workshop with a short series of questions to help you to understand that you can use chi-square analysis for more than simply testing the ratios from crosses.
By the way, don't get chi-square mixed up with Punnett squares.
Punnett squares are a useful way to simulate how alleles come
together in a mating and they result in ratios of the different
genotypes of offspring. Punnett squares produce expected ratios
of genotypes which you can now, easily, transform into ratios
of phenotypes.
A chi-square is a statistical tool that helps us to decide if
the observed ratio is close enough to the expected ratio to be
acceptable. Chi-square analysis can be used in any area, not just
genetics. Whenever you have to determine if an expected ratio
fits an observed ratio, you can use the chi-square.
Before we begin, pick up and print out a copy of your Chi-Square Worksheet. Fill it in as we work together and, when you
are done each Part, check your answers with mine.
(The hyperlink for the answer sheet is at the end of this page.) There is a single worksheet for this entire (three part) Workshop and a single answer sheet too. Just do it a section at a time.
After this Workshop you will be well prepared to do the SAQs from
the chi-square lessons.
Mendel's data from one experiment was ...
P = smooth seeds crossed with wrinkled seeds
F1 = all smooth seeds (so smooth is dominant and wrinkled is recessive)
F2 = 5,474 smooth seeds and 1,850 wrinkled seeds
1. What ratio did he observe?
5474 / 1850 = 2.9589189 : 1 = 2.96 : 1
2. What ratio did he expect?
3 : 1
You should understand that the chi-square compares the NUMBER (not ratio) observed to the NUMBER (not ratio) expected. You are given the observed numbers and from that data you might guess what the ratio should be. You then use that "guessed" ratio to calculate what the expected numbers would by from that guessed ratio.
Calculating the expected number is critical to doing the chi-square and many students have trouble with that first step - they forget how to do it, use it backwards or don't do it at all!
Let's work through this important step together so you will understand
that logic.
You already know the number observed.
Smooth = 5474
Wrinkled = 1850
3. What is the total number of seeds?
7324
4. What number of wrinkled is expected?
7324 / 4 = 183
5. What number of smooth is expected?
1831 X 3 = 5493 or 7324 X 3/4 = 5493
OK, you now have the expected numbers calculated from the expected ratio.
The best (easiest) way to COMPARE two values is to find their DIFFERENCE (by SUBTRACTION).
6. What is the difference between observed and expected smooth?
5474 - 5493 = -19
7. What is the difference between observed and expected wrinkled?
1850 - 1831 = 19
For "statistical magnification" we INCREASE those differences by squaring them.
8. What is the square of the difference between the observed and expected smooth?
-192 = 361 or -19 X -19 = 361
9. What is the square of the difference between the observed and expected wrinkled?
192 = 361 or 19 X 19 = 361
These "square of the differences" are too large and must be "NORMALISED" by dividing each by the number EXPECTED (NOT the number observed). This could be called the "squared differences per expected".
10. What is the square of the difference between the observed and expected smooth, divided by the expected number of smooth?
361 / 5493 = 0.06572 = 0.066
11. What is the square of the difference between the observed and expected wrinkled, divided by the expected number of wrinkled?
361 / 1831 = 0.19716 = 0.197
Lastly, we add together these "squared differences per expected" to give us the TOTAL "squared differences per expected".
12. What is the sum of the "squared differences per expected"?
0.066 + 0.197 = 0.263 the 2 = 0.263
Therefore, the chi-square for this experiment is 2 = 0.263.
OK - so what?
Statisticians have developed chi-square tables, based upon the
probabilities that a particular chi-square value will come about
purely by chance. There are two "features" to consider.
A. Significance Level
.
We (scientists) like to use the level of 5% as our significant
"cut-off". Any chi-square larger than the value from
the 5% Table indicates an experiment in which the ratios observed
are so far off the ratios expected that we have to conclude that
the ratios expected are wrong!
B. Degrees of Freedom
The more "classes" (categories) the more likely that
a statistical "blip" will increase the acceptable limits
of the chi-square. The "degrees of freedom" are one
less than the number of classes.
13. Name all the different classes in the experiment (earlier) ..
Smooth and Wrinkled
14. How many degrees of freedom were in that experiment?
2 - 1 = 1
One degree of freedom.
Here's a portion of the Chi Square Significance Table.
15. Is the chi-square you calculated within the boundary of "the possible"? |
|
Yes! We calculated a 2 = 0.263. With one degree of freedom we could have a chi-square up to 3.84 before we would become suspicious that the observed data was in a ratio too far removed from the ratio we tested.
When doing a Chi-square it helps to set it up as a table and to understand that all we have been doing is represented by the equation 2 = [(O - E)2/E]
Consider these results among the F2s
4,400 yellow seeds
1,624 green seeds
First, set up a table like the one below
E |
|||||
Second, enter the data. Remember, data is what is observed. So data goes in the "observed" (O) column.
E |
|||||
Next you fill in the "expected" (E) column.
Using the total as a starting point divide that number into
the two sets of data that would produce the 3 to 1 ratio you expect.
Note that it might be easier to do the 1 (green) of the 3 :1 ratio
first. However, if you are comfortable with fractions it shouldn't
be too hard to do them in any order.
E |
|||||
4518 |
|||||
1506 |
|||||
Notice that the total expected is the same as the total observed. If they don't add up to the same number you have made an error in the math.
Now fill in the rest of the table. It's a lot of work but, now that you have it all organized, it should be just a matter of using your calculator correctly. There is no reason to "total" columns O-E or(O-E)2 so leave them blank. However, it is very important to complete the "total" in the last column, (O-E)2/E, because that is the chi-square!
Fill in the rest of the table.
E |
|||||
4518 |
-118 |
13,924 |
3.08 |
||
1506 |
-118 |
13,924 |
9.24 |
||
Is the chi-square you calculated here within the boundary of "the
possible"?
(To answer that, first go back to the Chi Square Significance Table you saw earlier. Then page back down to here.)
NO! 2 = 12.32 but, with one degree of freedom we cannot accept any ratio that gives us a chi-square larger than 3.84.
Do we accept that these results are within acceptable range of a 3 : 1 ratio?
No! We must reject the 3 : 1 ratio. This data is far off the 3 : 1 ratio.
Consider these results from a dihybrid cross
30 red tall
65 white tall
83 red short
206 white short
Before we dive into the chi-square we have to first determine what ratio we will test and which category (class) fits with each part of the ratio.
Based upon these numbers, which phenotypes are dominant and recessive for the two loci? (Remember, these are the F2s from a dihybrid cross so they should be close to a specific ratio that you learned earlier. And you also learned which traits end up in each part of that ratio.)
Also, as best you can, assign genotypes to these phenotypes.
A dihybrid cross should produce a 9 : 3 : 3 :1 ratio in the F2s
and a simple look at the numbers will give you an idea of which
belongs to each category.
The biggest group is the white shorts so they must be the doubly
dominant class. In other words, white shorts can be assigned
the genotype W-S-.
On the opposite end of the ratio, the least represented group,
would be the doubly recessive so the red talls are the "1"
in the 9 : 3 : 3 :1 ratio and have the genotype wwss.
You can deduce the other two classes, making up the "3"
in the ratio. The white talls have the genotype W-ss
and the red shorts are wwS-.
Now that you have identified each category and assigned it to the ratio, we can begin the chi-square to determine if it fits.
Let's begin by first arranging our computation table. It will be twice the size of the previous table. It might help to arrange them in the table in a descending order to represent the 9 : 3 : 3 : 1 ratio. Draw the appropriate table including the observed numbers.
E |
|||||
(W-S-) |
|||||
(wwS-) |
|||||
(W-ss) |
|||||
(wwss) |
|||||
Great! We are ready to start. First determine the "expecteds". It might be easier to do the "1" part of the ratio first and work up the table. Regardless, take your time and calculate what the expected numbers should be and fill in the "E" column.
E |
|||||
(W-S-) |
216 |
||||
(wwS-) |
72 |
||||
(W-ss) |
72 |
||||
(wwss) |
24 |
||||
I hope you were able to work through that and get these numbers too. Did you check your math by adding up the column to make sure the E column equals the C column?
Now it is time to fill in the rest of the table and calculate the chi-square.
Go ahead and complete the calculations before paging down.
E |
|||||
(W-S-) |
216 |
10 |
100 |
0.463 |
|
(wwS-) |
72 |
11 |
121 |
1.681 |
|
(W-ss) |
72 |
-7 |
49 |
0.681 |
|
(wwss) |
24 |
6 |
36 |
1.500 |
|
Did you get 4.325 for the answer?
If you didn't, look over my answer and figure out where you went wrong - and try to learn from your error so you can do it right next time. [A common mistake occurs in the last column - many students divide by either the observed or by some other expected number. Remember to always divide by the expected number for that category.]
OK, you have calculated the chi-square and it is now time to do
something with it.
Here's a portion of the Chi Square Significance Table. How many "classes" (categories, groups) are in this experiment? |
|
Four (Red and tall, White and tall, Red and short, White and short)
Some students get through the difficult chi-square but then make a simple mistake at this point. Some get confused and pick a number out of the ratio and say there at nine classes! Or three. Or some other number and I cannot figure out where it came from. So, just to keep yourself thinking clearly, it is smart to list the categories.
Now, how many degrees of freedom are in this experiment?
Three (4 -1 )
Does the 9 : 3 : 3 : 1 ratio fit the data? |
|
Yes! With three degrees of freedom you can have a chi-square as large as 7.81 before we would be beyond our 5% significance.
Notice that if you had been so foolish as to stick with the one degree of freedom (that we were using with the monohybrid crosses) you would have decided that the chi-square was too large and would have (WRONGLY) rejected the ratio!
What is the expected ratio of boys to girls?
1 : 1
What is the degrees of freedom in that example?
There are two categories (classes) so there is one degree of freedom.
If a particular IVF clinic can, indeed, increase the odds, would you expect the chi-square to be above or below the value of 3.84 (which I got from the table above)?
If the IVF clinic can change the ratio from the expected 1 : 1 then the chi-square, calculated on the number of daughters or sons born, would be greater than 3.84.
I hope you understand that here we are "hoping" that the ratio will NOT be 1 : 1. (In point of fact, scientists aren't supposed to "hope" for results but the fact remains that they often hope a lot! )
You are the district manager of three fast food restaurants and you are looking over the revenues. You see that store A made $1,000,000, store B made $3,000,000 and store C brought in $5,000,000. You wonder if that is just a statistically blip. How would you use the chi-square to test the idea that these stores are different - beyond luck? (Don't do the chi-square - just tell me how you would set it up.)
You would "expect" a 1 : 1 : 1 ratio in the revenues
if they were all the same. In other words, the total revenues
of $9,000,000 would be distributed evenly. You would expect ...
Store A = $3,000,000
Store B = $3,000,000
Store C = $3,000,000
You could now find, for each store, the difference between expected
and observed revenues, square the difference, divide that by the
expected and then add all three together to get a chi-square value.
Suppose the manager of store A complains that you are not being fair because you haven't taken into account the differences in local population around each store. His store serves a smaller community. So, you go to the population records and discover that store A serves a population that is only a quarter the size of the communities served by stores B and C. Can you redo the chi-square? How?
The information about the populations tells you that there are
four times as many likely customers for stores B and C as A. You
can express that as a ratio of 1 : 4 : 4. If revenues are dependent
upon population you would expect ("expect" is the magic
word that means "here comes a chi-square")
Store A = $1,000,000
Store B = $4,000,000
Store C = $4,000,000
The observed revenues were
Store A = $1,000,000
Store B = $3,000,000
Store C = $5,000,000
Now you would do another chi-square to determine if these numbers
fit a 1 : 4 : 4 ratio (thus showing that revenues are probably
dependent upon population).
And finally, what is the degree of freedom for this-three store problem?
There are three categories (Stores, A, B and C) so there are two degrees of freedom.
These last few puzzles, about sex ratios and revenue ratios, are to show you that the chi-square has many uses and that all you have to do is identify how to think about the ratios, expectations and outcomes.
If you haven't done so already, pick up a copy of the answers to The Chi-Square and compare it to your own Worksheet. Make sure you understand it.
This work was created by Dr Jamie Love and licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Table of Contents | Homepage | How to get a FREE copy of the entire course (hypertextbook) | Frequently Asked Questions |