我们热爱生命科学!-生物行

累积概率(cumulative probability)的作图方法和理解

时间:2005-09-11 16:01来源:bio.net 作者:bioguider 点击: 2526次

To get the cumulative probability here's what you do:

1) Make a standard histogram of interevent intervals (IEI),amplitudes, rise times or whatever. That is, for the parameter of interest (say, IEI), create a number of bins spanning the range of values you observed (say, from 0 to 1000ms in steps of 5 ms). Then for each bin, count the number of events that had had that value. For example, when looking at interevent intervals of a Poisson process,one should get a histogram that decays exponentially. If looking at amplitudes of mEPSCs at the neuromuscular junction, one would get an amplitude histogram shaped like a Gaussian (but not usually at central synapses, where the distribution is skewed). These are actually "frequency histograms", since you are looking at the frequency of observing particular events.

2) Divide each binned value by the sum of  all the values. This makes the area of the whole thing equal 1. So now the height in each bin is approximately the probability of observing that class of events. This is now the "probability distribution".

3) To get the cumulative probability distribution, make a new histogram using the same bin spacing, but now fill each bin with the SUM of all the bin heights from 2) leading up to and including the current bin. Now, the height of each new bin tells the probability of
observing an event less than or equal to the current value. This distribution (obviously) starts at zero, curves upward approximately sigmoidaly, and assymptotes toward 1 (i.e., after examining all events, the probability is 1 that you will have observed events less than or equal to the largest event).

The usefulness of the cumulative distribution is that it is

1) smoother than the raw distribution (because the summation smooths out fluctuations between bins like a running average). This also means that you can -lace two similar CDFs on top of each other and its
easier to see whether they're different or not. This is hard to do with the raw histograms cause they're usually all lumpy.

2) Easy to tell whether the parent distrubution was symmetric or skewed.

3) Certain parameters of interest can be read right off the graph. For example, the point on the x-axis where the graph goes through 0.5 pn the y-axis is the median of the parent distribution, the point where it goes through 0.95 is the 95-th percentile, etc.

Reading Cumulative Probability Plots

The time to most recent common ancestor (TMRCA) calculations are given in terms of cumulative probability plots. These curves plot the probability that times are equal to or less than a given number of generations, i.e. the value on the vertical axis for a point T is the Probability(TMRCA < T), the cumulative probability

Consider the curve below, which is a function of the marker data and assumed mutation rates. Suppose you wish to assume a stepwise mutation model (the green curve). What is the probability that the time to the MRCA is 400 generations or less?

Reading from the curve, the value is 44%.

Hence, for any given T (number of generations), we can easily read off the probability that the actual TMRCA is that value or less.

One can also read these curves in reverse, setting a probability value and then asking how mnay generations correspond to that value.

For example, how many generations are required so that the probability of TRMCA is 80%? Reading from the curve (in this case for the infinite alleles model), this is 48 generations.


 

(责任编辑:泉水)
顶一下
(11)
91.7%
踩一下
(1)
8.3%
------分隔线----------------------------
发表评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
用户名: 验证码:点击我更换图片
特别推荐
推荐内容