二、Investment Tools: Quantitative Methods
1.A.: Time Value of Money
a: Calculate the future value (FV) and present value (PV) of a single sum of money.
Future Value:
FV = PV(1 + I/Y)N
Where PV = the amount of money invested today, I/Y = the rate of return, and N = the length of the holding period.
Example: Using a financial calculator, here's an example of how you would find the FV of a $300 investment (PV), given you earn a compound rate of return (I/Y) of 8% over a 10-year (N) period of time:
N = 10, I/Y = 8, PV = 300; CPT FV = $647.68 (ignore the sign).
Present Value:
PV = FV / (1 + I/Y)N
Example: Using a financial calculator, here's an example of how you'd find the PV of a $1,000 cash flow (FV) to be received in 5 (N) years, given a discount rate of 9% (I/Y).
N = 5, I/Y = 9, FV = 1,000; CPT PV = $649.93 (ignore the sign).
b: Calculate an unknown variable, given the other relevant variables, in single-sum problems.
Example 1: Solving for I/Y
In this example, you want to find the rate of return (I/Y) that you'll have to earn on a $500 investment (PV) in order for it to grow to $2,000 (FV) in 15 years (N). This very same problem could also be set up in terms of growth rates - e.g., what rate of growth (I/Y) is necessary for a company's sales to grow from $500 per year (PV) to $2,000 per year (FV) in 15 years (N).?
N = 15, PV = -500, FV = 2,000; CPT I/Y = 9.68%
Example 2: Solving for N
In this example, you want to find out how many years (N) it will take for a $500 investment (PV) to grow to $1,000 (FV), given that we can earn 7% annually (I/Y) on your money.
I/Y = 7, PV = -500, FV = 1,000; CPT N = 10.24 years.
c: Calculate the FV and PV of an regular annuity and an annuity due.
Calculate the FV of an ordinary annuity:
Example: Find the FV of an ordinary annuity that will pay $150 per year at the end of each of the next 15 years, given the investment is expected to earn a 7% rate of return.
N = 15, I/Y = 7%, PMT = $150; CPT FV = $3,769.35 (ignore the sign).
Calculate the FV of an annuity due:
Example: Find the FV
of an annuity due that will pay $100 per year for each of the next three years, given the cash flows can be invested at an annual rate of 10%.?
Note: When solving for a FV of an annuity due, you MUST put your calculator in the beginning of year mode (BGN), otherwise you'll end up with the wrong answer.
N = 3, I/Y = 10%, PMT = $100; CPT FV = $364.10 (ignore the sign).
Calculate the PV of an ordinary annuity:
Example: Find the PV of an annuity that will pay $200 per year at the end of each of the next 13 years, given a 6% rate of return.
N = 13, I/Y = 6, PMT = 200; CPT PV = $1,770.54
Calculate the PV of an annuity due:
Example: Find the PV of a 3-year annuity due that will make a series of $100 beginning of year payments, given a 10% discount rate.
Note: There are two ways to approach this question. The first is to put your calculator in BGN mode and then input all the variables as you normally would. The second is to shorten the annuity by one year (N - 1) and find the PV of that shortened annuity as if it were an ordinary annuity, then add the first annuity payment (PMT0) to it to come up with the PV of this annuity due. In this second alternative, you will leave your calculator in the END mode.
1. BGN mode: N = 3, I/Y = 10, PMT = 100; CPT PV = $273.55
2. END mode: N = 2, I/Y = 10, PMT = 100; CPT PV = $173.55 + 100 = PV = $273.55
d: Calculate an unknown variable, given the other relevant variables, in annuity problems.
Example: Find the PMT required to fund a retirement program of $3,000 at the end of 15 years, given a rate of return of 7%.
N = 15, I/Y = 7%, FV = 3,000; CPT PMT = $119.38 (ignore the sign).
Example: Suppose that you will deposit $100 at the end of each year for 5 years into an investment account. At the end of 5 years, the account will be worth $600. What is the rate of return?
N = 5, FV = 600, PMT = 100; CPT I/Y = 7 years.
Example: Solve for the PMT given a 13-year annuity with a discount rate of 6%, and a PV of $2,000.
N = 13, I/Y = 6, PV = 2,000; CPT PMT = $225.92.
Example: Suppose t
hat you have $1,000 in the bank today. If the interest rate is 8%, how many annual, end-of-year payments of $150 can you withdraw?
I/Y = 8, PMT = 150, PV = -1,000; CPT N = 9.9 years.
Example: What rate of return will you earn on an annuity that costs $700 today and promises to pay you $100 per year for each of the next 10 years?
N = 10, PV = 700, PMT = 100; CPT I/Y = 7.07%.
e: Calculate the PV of a perpetuity.
Example: Assume a certain preferred stock pays $4.50 per year in annual dividends (and they're expected to continue indefinitely). Given an 8% discount rate, what's the PV of this stock?
PVperpetuity = PMT / I/Y
PVperpetuity = 4.50 / .08 = $56.25
This means that if the investor wants to earn an 8% rate of return, she should be willing to pay $56.25 for each share of this preferred stock.
f: Calculate an unknown variable, given the other relevant variables, in perpetuity problems.
Example: Continuing with our example from LOS 1.A.e, what rate of return would the investor make if she paid $75.00 per share for the stock?
I/Y = PMT / PVperpetuity
4.50 / 75.00 = 6.0%
g: Calculate the FV and PV of a series of uneven cash flows.
FV Example: Given: I = 9%; PMT1 is $100; PMT2 is $500; and PMT3 is $900. How much is this future stream worth at the end of the 3rd year?
Example: Given: a 10% annual rate paid quarterly; PV = 500; time is 5 years; compute FV.
Solve: I = 10/4 = 2.5; N = 5 * 4 = 20; PV = 500: compute FV = 819.31.
i: Distinguish between the stated annual interest rate and the effective annual rate.
The stated rate of interest is known as the nominal rate, and represents the contractual rate. The periodic rate, in contrast, is the rate of interest earned over a single compound period - e.g., a stated (nominal) rate of 12%, compounded quarterly, is equivalent to a periodic rate of 12/4 = 3%. Finally, the true rate of interest is known as the effective rate and represents the rate of return actually being earned, after adjustments have been made for different compounding periods.
j: Calculate the effective annual rate, given the stated annual interest rate and the frequency of compounding.
Example: Compute the effective rate of 12%, compounded quarterly. Given m = 4, and periodic rate = 12/4 = 3%.
Effective rate = (1 + periodic rate)m - 1
Where m = the number of compounding periods in a year.
(1 + .03)
4? - 1 = 1.1255 - 1 = 12.55%
k: Draw a time line, specify a time index, and solve problems involving the time value of money as applied to mortgages, credit card loans, and saving for college tuition or retirement.
Example: Paying off a Loan (or Mortgage)
A company wants to borrow $50,000 for five years. The bank will lend the money at a 9% rate of interest and will require that the loan be paid off in five equal, annual (end-of-year) installment payments. What are the annual loan payments that this company will have to make in order to pay off this loan?
N = 5, I/Y = 9, PV = 50,000; CPT PMT = $12,854.62
This loan can be paid off in five equal annual payments of $12,854.62.
Example: Loan Amortization
An individual borrows $10,000 at 10% today amortized over 5 years. What are his payments?
PV = 10,000, N = 5, I/Y = 10; CPT PMT = $2,637.97
He will pay $2,637.97 at the end of each of the ne
xt 5 years.
Example: Funding a Retirement Program
A 35-year old investor wants to retire in 25 years at age 60. Given he expects to earn 12.5% on his investments prior to his retirement, and then 10% thereafter, how much must he deposit annually (at the end of each year) for the next 25 years in order to be able to withdraw $25,000 per year (at the beginning of each year) for the next 30 years?
This is a two-part problem. First, use PV to compute the present value of the 30-year, $25,000 annuity due and second, use FV to find the amount of the fixed annual deposits that must be made at the end of the first 25-year period to come up with the needed funds.
Step 1: N = 29, I/Y = 10, PMT = 25,000; CPT PV = 234,240 + 25,000 = $259,240
Step 2: N = 25, I/Y = 12.5, FV =259,240; CPT PMT = $1,800.02
The investor will need a nest egg of $259,240. He will then have to put away $1,800 per year at the end of each of the next 25 years in order to accumulate a nest egg worth $259,240 - which will enable him to withdraw $25,000 per year for the following 30 years.
1.B: Statistical Concepts and Market Returns
a: Differentiate between a po
pulation and a sample.
A population is defined as all members of a specified group. Any descriptive measure of a population characteristic is called a parameter. Populations can have many parameters, but investment analysts are usually only concerned with a few, such as the mean return, or the standard deviation of returns.
A sample is defined as a portion, or subset of the population of interest. Even if it is possible to observe all members of a population, it is often too expensive or time consuming to do so. Once the population has been defined, we can take a sample of the population with the view of describing the population as a whole.
b: Explain the concept of a parameter.
Any descriptive measure of a population characteristic is called a parameter.
c: Explain the differences among the types of measurement scales.
Nominal scale: this represents the weakest level of measurement. Observations are classified or counted with no particular order. An example would be assigning the number one to a large cap value fund, the number two to a large cap growth fund, etc.
Ordinal scale: this is a higher level of measurement. All observations are placed into separate categories and the categories are placed in order with respect to some characteristic. An example would be ranking 100 large cap growth mutual funds by performance and assigning the number one to be the 10 best performing funds and the number ten to the 10 worst performing funds
Interval scale: this scale provides ranking and assurance that differences between scale values are equal. Measuring temperature is a prime example.
Ratio scale: these represent the strongest level of measurement. In addition to providing ranking and equal differences between scale values, ratio scales have a true zero point as the origin. Money is a good example.
d: Define and interpret a frequency distributions.
A frequency distribution is agrouping of raw data into categories (called classes) so that the number of observations in each of the nonoverlapping classes can be seen and tallied. The purpose of constructing a frequency distribution is to group raw data into a useable visual framework for analysis and presentation.
e: Define, calculate, and interpret a holding period return.
Holding period return (HPR) measures the total return for holding an investment over a certain period of time, and can be calculated using the following formula:
HPR = Pt - Pt - 1 + Dt / Pt - 1
Wher
e: Pt = price per share at the end of time period t, and Dt = cash distributions received during time period t.
Example: A stock is currently worth $60. If you purchased the stock exactly one year ago for $50 and received a $2 dividend over the course of the year, what is your HPR?
(60 - 50 + 2) / 50 = 24%
f: Define and explain the use of intervals to summarize data.
An interval is the set of return values within which an observation falls. Each observation falls into only one interval, and the total number of intervals covers the entire population. It is important to consider the number of intervals to be used. If too few intervals are used, too much data may be summarized and we may lose important characteristics; if too many intervals are used, we may not summarize enough. Each interval has a lower limit and an upper limit. Intervals must be all-inclusive and non-overlapping.
After intervals have been defined, you must tally the observations and assign each observation to its respective interval.
Once the data set has been tallied, you should count the number of observations that were placed in each interval. The actual number of observations in a given interval is called the absolute frequency, or simply the frequency.
g: Calculate relative frequencies, given a frequency distribution.
Another useful way to present data is the relative frequency. Relative frequency is calculated by dividing the frequency of each return interval by the total number of observations. Simply, relative frequency is the percentage of total observations falling within each interval.
h: Describe the properties of data presented as a histogram or a frequency polygon.
A histogram is the graphical equivalent of a frequency distribution. It is a bar chart of continuous data that has been grouped into a frequency distribution. The advantage of a histogram is that we can quickly see where most of the observations lie. To construct a histogram, the class intervals are scaled on the horizontal axis and the absolute frequencies are scaled on the vertical axis.
A second graphical tool used for displaying data is the frequency polygon. To construct a frequency polygon, we plot the midpoint of each interval on the horizontal axis and the absolute frequency for that interval on the vertical axis. Each point is then connected with a straight line.
i: Define, calculate, and interpret measures of central tendency, including the population mean, sample mean, arithmetic mean, geometric mean, weighted mean, median, and mode.
A population mean is the entire group of objects that are being studied. To find the population's mean, sum up all the observed values in the population (sum X) and divide this sum by the number of observations (N) in the population.
A sample mean is sum of all the values in a sample of a population divided by the number of values in the sample. The sample mean is used to make inferences about the population mean.
Example: A stock you and your research partner are analyzing has 12 years of annualized return data. The returns are 12%, 25%, 34%, 15%, 19%, 44%, 54%, 33%, 22%, 28%, 17%, and 24%. Your research partner is exceedingly lazy and has decided to collect data based on only five years of returns. Given this data, calculate the population mean and calculate the sample mean. (Your partner's data set is shown above as bold).
Population mean = 12 + 25 + 34 + 15 + 19 + 44 + 54 + 33 + 22 + 28 + 17 + 24 / 12 = 27.25%
Sample mean = 25 + 34 + 19 + 54 + 17 / 5 = 29.8%
Arithmetic mean is the sum of the observation values divided by the number of observations. It is the most widely used measure of central tendency, and is the only measure where the sum of the deviations of each value from the mean is always zero.
Example: A data set contains the following numbers: 5, 9, 4, and 10. The mean of these numbers is: ( 5 + 9 + 4 + 10) / 4 = 7. The sum of the deviations from the mean is: (5 - 7) + (9 - 7) + (4 - 7) + (10 - 7) = -2 + 2 - 3 + 3 = 0.
Geometric mean is often used when calculating investment returns over multiple periods, or to find a compound growth rate.
Example: For the last three years the return for Acme Corporation common stock have been -9.34%, 23.45%, and 8.92%. Find the geometric mean.
Take the cube root of (-.0934 + 1)(.2345 + 1)(.0892 + 1) = The cube root of 1.21903. On your TI calculator, enter 1.21903 and hit the yx key, then enter .3333 = to get of 1.06825. Now you must subtract this number from one to get an answer of 6.825%.
Weighted mean is a special case of the mean that allows different weights on different observations.?
Example: A portfolio consists of 50% common stocks, 40% bonds, and 10% cash. If the return on common stocks is 12%, the return on bonds is 7%, and the return on cash is 3%, what is the return to the portfolio?
Weighted mean = [(0.50 *0.12) + (0.40 * 0.07) + (0.10 * 0.03)] = 0.091, or 9.1%
The median is the mid-point of the data when the data is arranged from the largest to the smallest values. Half the observations are above the median and half are below the median. To determine the median, arrange the data from highest to the lowest and find the middle observation.
Example: The five-year annualized total returns for five investment managers are 30%, 15%, 25%, 21%, and 23%. Find the median return for the managers.
First, arrange the returns from hi to lo: 30, 25, 23, 21, 15.
The return observation half way down from the top is 23%.
The mode of a data set is the value of the observation that appears most frequently.
Example: In the following set of numbers, 30%, 28%, 25%, 23%, 28%, 15%, and 5%, 28% is the most frequently occurring value.
j: Distinguish between arithmetic mean and geometric means.
The value for the arithmetic mean is higher. The geometric mean will always be less than or equal to the arithmetic mean. In general, the difference between the two means increases with the variability between period-by-period observations. The only time when the two means will be equal is when there is no variability in the observations (e.g., all observations are 10%).
k: Define, calculate, and interpret (1) a portfolio return as a weighted mean, (2) a weighted average or mean, (3) a range and mean absolute deviation, and (4) a sample and a population variance and standard deviation.
Refer to LOS 1.B.i for a review of weighted mean and weighted average.
Range is the distance between the largest and the smallest value in the data set.
Example: The five-year annualized total returns for five investme
nt managers are 30%, 12%, 25%, 20%, and 23%. What is the range of the data? Range = 30 - 12 = 18%.
Mean absolute deviation (MAD) is the average of the absolute values of the deviations of individual observations from the arithmetic mean. Remember that the sum of all of the deviations from the mean is equal to zero. To get around this zeroing out problem, the mean deviation uses the absolute values of each deviation.
Example: Continuing from above, what is the mean deviation of investment returns and how is it interpreted?
MAD = {I (30 - 22) I + I (12 - 22) I + I (25 - 22) I + I (20 - 22) I + I (23 - 22) I } / 5
MAD = [ 8 + 10 + 3 + 2 + 1] / 5 = 4.8%
Population variance is the mean of the squared deviations from the mean. The population variance is computed using all members of a population.
Example: Assume the five-year annualized total returns for the five investment managers used in the earlier example represent all of the managers at a small investment firm. What is the population variance?
μ = {30 + 12 + 25 + 20 + 23} / 5 = 22%
ó = { (30 - 22)2 + (12 - 22)2 + (25 - 22)
2 + (20 - 22)2 + (23 - 22)2 } / 5 = 35.60%2
Population standard deviation is the square root of the population variance.
Example: Continuing with our example, take the square root of 35.60 = 5.97%
Sample variance applies when we are dealing with a subset, or sample of the total population.
Example: Assume the five-year annualized total returns for the five investment managers used in the earlier example represent only a sample of the managers at a large investment firm. What is the sample variance?
sample mean = {30 + 12 + 25 + 20 + 23} / 5 = 22%
s2 = { (30 - 22)2 + (12 - 22)2 + (25 - 22)2 + (20 - 22)2 + (23 - 22)2 } / 5 - 1 = 44.5%2
Sample standard deviation can be found by taking the positive square root of the sample variance.
Example: Continuing with our example, take the square root of 44.50 = 6.67%
l: Calculate the proportion of items falling within a specified number of standard deviaitons of the mean, using Chebyshev's inequality.
Chebyshev's inequality states that for any set of observations (sample or population, regardless of the shape of the distribution), the proportion of the observations within k standard deviations of the mean is at least 1 - 1/k2 for all k > 1. If we know the standard deviation, we can use Chebyshev's inequality to measure the minimum amount of dispersion, regardless of the shape of the distribution.
Chebyshev's inequality states that for any distribution, approximately:
36% of observations lie within 1.25 standard deviations of the mean
56% of observations lie within 1.50 standard deviations of the mean
75% of observations lie within 2 standard deviations of the mean
89% of observations lie within 3 standard deviations of the mean
94 of observations lie within 4 standard deviations of the mean
m: Define, calculate, and interpret the coefficient of variation.
The coefficient of variation expresses how much dispersion exists relative to the mean of a distribution and allows for direct comparison of dispersion across different data sets.
CV = [standard deviation of returns]/[Expected rate of return]
Example:
Investment A has an ER of 7% and a s of .05.
Investment B has an ER of 12% and a s of .07.
Which is riskier?
A’s CV is .05/.07 = .714
B’s CV is .07/.12 = .583
A has .714 units of risk for each unit of return while B has .583 units of risk for each unit of return. A is riskier, it has more risk per unit of return.
n: Define, calculate, and interpret the Sharpe measure of risk-adjusted performance.
The Sharpe measure seeks to measure excess return per unit of risk. The numerator of the Sharpe measure recognizes the existence of a risk-free return. Portfolios with large Sharpe ratios are preferred to portfolios with smaller ratios because it is assumed that rational investors prefer return and dislike risk. The Sharpe ratio is also called the reward-to-variability ratio.
Example: The mean monthly return on T-bills is 0.25%. The mean monthly return on the S&P 500 is 1.30% with a standard deviation of 7.30%. Calculate the Sharpe measure for the S&P 500 and interpret the results.
Sharpe measure = (1.30 - 0.25) / 7.30 = 0.144
The S&P 500 earned 0.144% of excess return per unit of risk, where risk is measured by standard deviation.
o: Describe the relative locations of the mean, median, and mode for a nonsymmetrical distribution.
For a symmetrical distribution, the mean, median, and mode are equal.
For a positively skewed distribution, the mode is less than the median, which is less than the mean. Recall that the mean is affected by outliers. In a positively skewed distribution, there are large, positive outliers which will tend to "pull" the mean upward.
For a negatively skewed distribution, the mean is less than the median, which is less than the mode. In this case, there are large, negative outliers which tend to "pull" the mean downward.
p: Define and interpret skewness and explain why a distribution might be positively or negatively skewed.
Skewness refers to a distribution that is not symmetrical.
A positively skewed distribution is characterized by many outliers in its upper or right tail. Recall that an outlier is defined as an extraordinarily large outcome in absolute value. Positively skewed distributions have long right tails.
A negatively skewed distribution is the opposite of a positively skewed distribution. A negatively skewed distribution has a disproportionately large amount of outliers on its left side. In other words, a negatively skewed distribution is said to have a long tail on its left side.
q: Define and interpret kurtosis and explain why a distribution might have positive excess kurtosis.
Kurtosis deals with whether or not a distribution is more or less "peaked" than a normal distribution.
A distribution that is more peaked than normal is leptokurtic. A leptokurtic return distribution will have more returns clustered around the mean and more returns with large deviations from the mean (fatter tails).
A distribution that is less peaked, or flatter than normal is said to be platykurtic.
For all normal distributions, kurtosis is equal to three. Statisticians, however, sometimes report excess kurtosis, which is defined as kurtosis minus three. A normal distribution has excess kurtosis equal to zero, a leptokurtic distribution has excess kurtosis greater than zero, and platykurtic distribution will have excess kurtosis less than zero.
r: Explain why a semi-logarithmic scale is often used for return performance graphs.
Semi-logarithmic scales use an arithmetic scale on the horizontal axis, but use a logarithmic scale on the vertical axis. Hence, values on the vertical axis are spaced according to their logarithms. On a semi-logarithmic scale, equal movements on the vertical axis reflect equal percentage changes.
1.C: Probability Concepts
a: Define a random variable.
A random variable is a quantity whose outcomes are uncertain. A realized random variable is a number associated with the outcome of an experiment. When rolling a conventional six-sided die, the random variable might be the number that faces up with the die stops rolling.
b: Explain the two defining properties of probability.
The probability of any event "i" is between zero and one.
If a set of events: E1, E2, ....En, are mutually exclusive and exhaustive, then the sum of the probabilities of those events equals one.
Mutually exclusive means that the events do not share any outcomes. Knowing that you have an outcome in one event excludes the possibility of an outcome in another event.
Exhaustive means that a given list of events represent all possible outcomes.
c: Distinguish among empirical, a priori, and subjective probabilities.
We can assign probabilities to events three ways:
We calculate an empirical probability by analyzing past data.
We calculate an a priori probability by using formal reasoning and inspection.
A subjective probability is less formal and involves personal judgment.
d: Describe the investment consequences of probabilities that are inconsistent.
With respect to investment opportunities, when two assets are price based upon different probabilities being assigned to the same event, this is called inconsistent probabilities. It is best defined by a general example.
Example: Event E will increase the return of both stock A and B. The price of stock A incorporates a higher probability of E than does stock B. All other things equal, stock A is overpriced when compared to stock B. Therefore, an investor should lower holdings of stock A and increase holdings of stock B. An investor that is not too risk averse might engage in a
pairs arbitrage trade, where he/she short sells A and uses the proceeds to buy stock B.
e: Distinguish between unconditional and conditional probabilities.
An unconditional probability is also called a marginal probability, and it is the most basic type of probability. It is the probability of an event where the occurrence of other events is not important. We might be concerned with the probability of an economic recession where we do not care about interest rates, inflation, etc. In such a case, we would be concerned with the unconditional probability of a recession.
A conditional probability is one where the knowledge of some other event is important. We might be concerned about the probability of a recession given that the monetary authority increases interest rates. This is a conditional probability. The key thing to look for is "the probability of A given B." This is noted by a vertical bar symbol.
f: Define a joint probability.
An joint probability is the probability that both events occur at the same time, but neither is certain or a given. We write the probability of A and B as P(AB). Unless both A and B occur, it does not qualify as the event "A and B."
g: Calculate, using the multiplication rule, the joint probability of two events.
There is a relationship between the *s P(AB) and P(A I B). It is called the multiplication rule for probabilities:
P(AB) = P(A I B) * P(B)
In words this is: "the probability of A and B is the probability of A given B times the unconditional probability of B."
We can manipulate this to give the following representation for a conditional probability: P (A I B) = P(AB) / P(B)
Example: We will assume the probabilities in the list below:
The probability of the monetary authority increasing interest rates "I" is 40%: P(I) = .4
The probability of a recession "R" given an increase in interest rates is 70%: P(R given I) = .7
The probability of "R" without an increase in interest rates is 10%: P(R given IC) = .1
Without additional information, we can assume that the events "increase in interest rates" and "no increase in interest rates" are the only possible events. They are mutually exclusive and exhaustive, and since there are only two events, they are called complements. The superscript "C" stands for complement.
P(IC
) = 1 - P(I) = .60
What is the probability of "recession and an increase in interest rates?"
P(RI) = P(R given I) * P(I) = .7 * .4 = .28
h: Calculate,using the addition rule, the probability that at least one of two events will occur.
The general rule of addition states that if two events A and B are not mutually exclusive then you must account for the joint probability of events. That is the possibility that the two events will occur at exactly the same time. Joint probability is shown by the overlap of the occurrence circles in the traditional Venn diagram shown below.
P (A or B) = P (A) + P(B) – P(A and B),where P(A and B) is the joint probability of A and B.
The joint probability [P(A and B)]is defined as the probability that measures the likelihood that 2 or more events will happen concurrently.
P(A and B) = P(A)*P(B) for independent events, or
P(A and B) = P(A)*P(B given that A occurs) for conditional events.
i: Distinguish between dependent and independent events.
Independent events are a list of events where knowledge of one has no influence on the other. That is easily expressed using conditional probabilities. A and B are independent if:
P (A I B) = P (A), and P(B I A) = P(B)
The best examples of independent events are found with the a priori probabilities of dice throws or coin flips. A die has no memory; therefore, the event of a "4" on a second throw of a die is independent of a "4" on the first throw.
j: Calculate a joint probability of any number of independent events.
The multiplication rule for independent events is:
P (A I B) * P(B) = P(A) * P(B) = P(AB)
P (B I A) * P(A) = P(B) * P(A) = P(AB)
Example: On the roll of two dice, the probability of getting two "4s" is:
P(4 on first die and 4 on second die) = P(4 on first die) * P(4 on second die)
P(4 on first die and 4 on second die) = (1/6) *(1/6) = 1/36 = .0278.
k: Calculate, using the total probability rule, an unconditional probability.
The total probability rule is used to demonstrate how joint probabilities tie in with unconditional probabilities. If we continue with our example from LOS 1.C.g about interest rates and recession, and assume that the events "I" and "IC" are mutually exclusive and exhaustive, then a recession can only occur with either of these two events. In that case, the sum of these two joint probabilities is the unconditional probability of a recession:
P(R) = P(R given I) * P(I) + P(R given IC) * P(IC)
P(R) = P(RI) + P(RIC)
P(R) = .28 + .06 = .34
l: Define and calculate expected value.
The expected value is the probability-weighted average of the possible outcomes of the random variable.?
E(X) = ∑xi*P(xi) = x1*P(x1
) + x2*P(x2) + … + xn*P(xn)
Here, the "E" denoted expected value. The symbol x1 is the first realization of random variable X. The symbol x2 is the second realization, etc. In the long run, the realizations should average to the expected value. This is most easily seen using the a priori probabilities associated with a coin toss. On the flip of one coin, we might designate the event "head" as letting the random variable equal one. Alternatively, the event "tail" means the random variable equals zero. A statistician would write:
If head, the X = 1
If tail, then X = 0
For a fair coin where P(head) = P(X = 1) = 0.5 and P(tail) = P(X = 0) = 0.5, the probability weighted average or expected value is:
E(X) = P(X = 0) * 0 + P(X = 1) = 0.5
For the coin flip, X cannot assume a value of 0.5 in any single experiment. Over the long term, however, the average of all the outcomes should be 0.5.
m: Define, calculate, and interpret variance and standard deviation.
The variance is the expected value of the squared deviations of each observation from the random variable's expected value. As an expected value, the variance uses the probability of each observation xi to weight the associated squared deviation: [xi - E(X)]2. The formula for variance is:
ó2(X) = ó [xi - E(X)]2 * P(xi)
The standard deviation is the positive square root of the variance. It may be represented by ó(X) or just ó.
Example: Electcomp Corporation manufactures electronic components for computers and other devices. There is speculation that Electcomp will soon announce a major expansion into overseas markets. Electcomp would only do this if its managers estimated the demand to be sufficient to support the sales. If demand is sufficient, Electcomp would also be more likely to raise prices. For ease of notation, let expand overseas = "O" and let increase prices = "I."
An industry analyst determines the following probabilities:
P(I) = 0.3 and P(IC) = 0.7
P(O given I) = 0.6 and P(O given IC) = 0.2
The probabilities P(I) and P(IC) are also called the priors. Bayes' formula now allows us to compute P(I given O) where this is the updated probability given new information about "I." We recall the following formulas:
Conditional probability: P(I given O) = P(IO) / P(O)
Joint probability: P(IO) = P(O given I) * P(I)
For this case, Bayes' formula becomes:
P(I given O) = {P(O given I) / P(O)} * P(I)
Use the total probability rule to compute P(O):
P(O) = P(O given I) * P(I) + P(O given IC) * P(IC)
P(O) = 0.6 * 0.3 + 0.2 * 0.7 = 0.32
P(I given O) = {.6 / .32} * .3 = .5625
If the new information of expand overseas is announced, the prior probability of P(I) = 0.30 must be updated with the new information to give P(I given O) = .5625.
u: Calculate the number of ways a specified number of tasks can be performed using the multiplication rule of counting.
The multiplication rule of counting applies when we have a list of k choices, and each choice "i" has ni ways of being chosen. The number of total choices is n1 * n2 *...*nk.
Example: An analyst is interested in whether a firm raises prices, lowers prices, or keeps prices the same. The analyst also is interested in whether the firm expands overseas. Each of these represent separate choices that can occur in different ways: n1 = 3 and n2 = 2. This gives n * n2 = 2*3 = 6 possible ways of arranging the
choices. The list of possible pairs of choices is: (raise prices, expand), (raise prices, not expand), (lower price, expand), (lower prices, not expand), (keep prices the same, expand), (keep prices the same, not expand).
If a third item is up for consideration for which there are n3 choices, then there will be n1 * n2 * n3 ways of arranging the items.
v: Solve counting problems using the factorial, combination, and permutation notations.
Labeling is where there are n items of which each can receive one of k labels. The number of items that receive label "1" is n1 and the number that receive label "2" is n2, etc. Also, the following relationship holds:
n1 + n2 + n3 +...+nk = n.
The number of ways labels can be assigned is:
n! / (n1
!) * (n2!) *...*(nk!).
On your TI financial calculator, factorial is [2nd], [x!].
Example: A portfolio consists of eight stocks. The goal is to designate four of the stocks as "long-term holds," designate three of the stocks as "short-term holds," and designate one stock a "sell." How many ways can these labels be assigned to the eight stocks? The answer is (8!) / (4!*3!*1!) = (40,320) / (24*6*1) = 280.
w: Distinguish between problems for which different counting methods are appropriate.
The multiplication rule of counting is used when there are two or more groups. The key is that we choose only one item from each group.
Factorial is used by itself when there are no groups - we are only arranging a given set of n items. Given n items, there are n! ways of arranging them.
The labeling formula applies to three or more sub-groups of predetermined size. Each element of the entire group must be assigned a place or label in one of the three or more sub-groups.
The combination formula applies to only two groups of predetermined size. Look for the word "choose" or "combination."
The permutation formula applies to only two groups of predetermined size. Look for a specific reference to "order" being important.
x: Calculate the number of ways to choose r objects from a total of n objects, when the order in which the r objects is listed does or does not matter.
A special case of this labeling is when k = 2. That is that n items can only be in one of two groups and n1 + n2 = n. In that case, we can let r = n1 and n2 = n - r. Since there are only two categories, we usually talk about choosing r items. Then (n - r) are not chosen. The general formula for labeling when k = 2 is called the combination formula (or binomial formula):
n! / (n - r)! * r!
Example: In a portfolio of eight stocks, we decide to sell three stocks. How many ways can we choose three of the eight to sell? The answer is 8! / (5! * 3!) = 56.
1.D: Common Probability Distributions
a: Explain a probability distribution.
A probability distribution specifies the probabilities of all the possible outcomes of a random variable. Those outcomes may be discrete or continuous.
b: Distinguish between and give examples of discrete and continous random variables.
A discrete random variable is one where we can list all the possible outcomes and for each possible outcome there is a measurable, positive probability. An example of a discrete random variable is the number of days it rains in a given month. A continuous distribution would define the probabilities for the actual amount of rainfall. A continuous random variable is one where we cannot list the possible outcomes because we can always list a third number between any two numbers on the list. The number of outcomes is essentially infinite even if lower and upper bounds exist. The number of points between the lower and upper bounds are essentially infinite.
c: Describe the range of possible outcomes of a specified random variable.
In the discrete case, such as the number of days of rain in a month, there are a finite number of outcomes as defined by the number of days in the month. In the continuous case, such as the amount of rainfall, the outcome can be recorded out to many decimal places. We say the probability of two inches of rainfall is essentially zero because it is a single point. We must talk about the probability of the amount of rain being between two and three inches. In other words:
For a discrete distribution p(x) = 0 when "x" cannot occur, or p(x) > 0 if it can.
For a continuous distribution p(x) = 0 even though "x" can occur, but we can only consider p(x1≤ X ≤ x2), where x1 and x2 are actual numbers.
d: Define a probabiliry function and state whether a given function satisfies the conditions for a probability function.
The probability function specifies the probability that the random variable takes on a specific value.
Example: The following is a probability function:
For X: (1, 2, 3, 4), p(x) = x / 10, else p(x) = 0.
The probabilities are p(1) = 0.1, p(2) = 0.2, p(x) = 0.3, and p(x) = 0.4, all of which are between zero and one. Also, 0.1 + 0.2 + 0.3 + 0.4 = 1.
e: State the two key properties of a probability function.
The two key properties of a probability function are:
0 ≤ p(x) ≤ 1
The sum of all the probabilities for mutually exclusive and exhaustive outcomes must equal one.
f: Define a cumulative distribution function and calculate probabilities for a random variable, given a cumulative distribution function.
The
cumulative distribution function or just "distribution function" defines the probability that a random variable takes a value equal to or less than a given number: P(X ≤ x) or F(X). Using the probability function defined earlier: X:(1, 2, 3, 4), p(x) = x / 10, F(1) = 0.1, F(2) = 0.3, F(3) = 0.6, F(4) = 1. In other words, F(3) represents the cumulative probability that outcomes 1, 2, and 3 occur.
g: Define a discrete uniform random variable and calculate probabilities, given a discrete uniform probability distribution.
The discrete uniform random variable is one where the probabilities are equal for all possible outcomes. One example is X:(1,2, 3, 4, 5), p(x) = 0.2. In this case, the probabilities are equal for each possible outcome (20%). The probability of any one outcome is 0.2 and the probability of any "n" outcomes is n * 0.2. For example, p(2 ≤ X ≤ 4) = p(2) + p(3) + p(4) = 0.6, and F(2) = p(1) + p(2) = 0.4.
h: Define a binomial random variable and calculate probabilities, given a binomial probability distribution.
The binomial random variable is the number of "success" in a given number of "trial" where the outcome can either be "success" or "failure." The probability of success is constant for each trial, and the trials are independent. A trial is like a mini-experiment and the final outcome is the number of successes in the series of n-trials. Under these conditions, the probability of "x" success in "n" trials is calculated using the following formula:
p(x) = P(X = x) = [number of ways to choose x from n]px(1 - p)n - x
Where the term in brackets = n! / {(n - x)!x!}, and p = the probability of "success" on each trial.
i: Calculate the expected value and variance of a binomial random variable.
Expected value of X: E(X) = n * p
Variance of X: V(X) = n * p * (1 - p)
Example: Using empirical probabilities, for any given day the probability that the DJIA will increase is 0.67. We will assume that the only other outcome is that it decreases in a day. Hence, p(UP) = 0.67 and p(DOWN) = 0.33. We will assume that whether the DJIA increases in one day is independent of whether it decreases in anther day. What is the probability that the DJIA will increase three out of five days? What is the expected number of up days in a five-day period?
Here we define success as UP, so p = 0.67. The definition of success is critical to any binomial problem. The n items are the five days: n = 5. The number of successes we are computing the probability for is x = 3. The formula is:
p(3) = P(X = 3) = 5! / [(5 - 3)! * 3!] * (0.673) * (0.335 - 3)
p(3) = 10 * 0.301 * 0.109
p(3) = 0.328
Expected value of X: E(X I n = 5, p
= 0.67) = 5 * 0.67 = 3.35
Variance of X: V(X) = n * p * (1 - p) = 5 * 0.67 * 0.33 = 1.106
j: Construct a binomial tree to describe stock price movement and calculate the expected terminal stock price.
Example: Random variable X follows a continuous uniform distribution over 12 to 28, that is a = 12, and b = 28. The probability of an outcome between 15 and 25 is:
P(15 ≤ X ≤ 25) = (25 - 15) / (28 - 12)
= 10 / 16 = 0.625
l: Explain the key properties of the normal distribution.
The normal distribution has the following key properties:
It is completely described by its mean and variance, we write X ~ N(μ, σ2). In words, this says that "X" is normally distributed with mean μ and variance σ2.
Skewness = 0. This means the normal distribution is symmetric about its mean, so that P(X ≤ μ) = P(μ ≤ X) = 0.5, and mean = median = mode.
Kurtosis = 3; this is a measure of how "flat" the distribution is. Recall that excess kurtosis is measured relative to the number "3."
A linear combination of normally distributed random variables is also normally distributed.
The following general properties are also very important:
34% of the area falls between 0 and +1 standard deviations from the mean. So, 68% of the observations fall within +/- one standard deviation of the mean.
45% of the area falls between 0 and +1.65 standard deviations from the mean. So, 90% of the observations fall within +/- 1.65 standard deviations of the mean.
47.5% of the area falls between 0 and 1.96 standard deviations from the mean. So, 95% of the observations fall within +/- 1.96 standard deviations of the mean.
m: Construct confidence intervals for a normally distributed random variable.
Often we must use an approximation of ? and σ with the sample mean and sample standard deviation denoted as "X bar" and "s." These are point estimates. We often frame probability statements for a random variable using confidence intervals that are built around these point estimates. Some important examples of confidence intervals are below. We should note the similarity between these and the statements above using ? and σ.
P(X will be within X bar +/- 1.65 * s) = 90%. We say: The 90 percent confidence interval for X is X bar - 1.65 * s to x bar + 1.65 * s.
P(X will be within X bar +/- 1.96 * s) = 95%. We say: The 95 percent confidence interval for X is X bar - 1.96 * s to X bar + 1.96 * s.
P(X will be within X bar +/- 2.58 * s) = 99%. We say: The 99 percent confidence interval for X is X bar - 2.58 * s to X bar + 2.58 * s.
Example: Using a 20-year sample, the average return of a mutual fund has been 10.5 % per year with a standard deviation of 18 %. Using these two point estimates, what is the 95% confidence interval for the return next year?
10.5 - 1.96 * 18 or -24.78 to 10.5 + 1.96 * 18 = 45.78.
In symbols, P(-24.78 < return < 45.78) = 95%.
n: Define the standard normal distribution and explain how to standardize a random variable.
The standard normal distribution is a normal distribution that has been "normalized" so that it has a mean of zero and a standard deviation of one. To standardize an observation from any given normal curve, you must calculate the observation's Z-value. The Z-value tells you how far away the given observation is from the population mean in units of standard deviation. This is how we standardize a random variable.
Z = (observation - population mean) / standard deviation = (X - ?) / σ
Example: The EPS figures for a large group of firms are normally distributed with a mean of $6 and a standard deviation of $2. What is the Z-value given and EPS of $8, that is X = 8? How about X = 2?
If X = 8, then Z = (( X - ?) / σ = (8 - 6) / 2 = +1
If X = 2, then Z = ( X - ?) / σ = (2 - 6) / 2 = -2
The
Z of +1.00 indicates that an EPS of $8 is one standard deviation above the mean, and a Z of -2.00 shows that the EPS of $2 is two standard deviations below the mean. Using the symmetric properties of the normal distribution, we can approximate that the probability of the EPS falling between $2 and $8 is 0.475 + 0.34 = 0.815.
o: Calculate probabilities using the standard normal probability distribution.
With the aid of a normal probability table, we can precisely compute the probability of a normally distributed random variable falling between any two values.
Example: The Z table tells us that F(2) = 0.9772, thus F(-2) = 1 - .09772 = 0.0228. This tells us that 0.0228 or 2.28% of the area falls below Z = -2 and an equal amount falls above Z = +2. Furthermore, P(-2 ≤ Z ≤ 2) = 1 - 0.0228 - 0.0228 = 0.9544. Another way to do this is to write:
P(-2 ≤ Z ≤ 2) = F(2) - F(2) = 0.9772 - 0.0228 = 0.9544.
p: Distinguish between a univariate and a multivariate distribution.
A multivariate distribution specifies the probabilities for a group of random variables. It is meaningful when the random variable in the group are not independent. A joint probability table describes the multivariate distribution between two discrete random variables. Multivariate relationships can exist between two or more continuous random variables, e.g., inflation, unemployment, interest rates, and exchange rates.
A multivariate normal distribution applies when all the random variables have a normal distribution. As mentioned earlier, one of the characteristics o
f a normal distribution is that a linear combination of normal random variables is normal as well. For example, if the return of each stock in a portfolio is normally distributed, then the return on the portfolio will be normally distributed.
q: Explain the role of correlation in the multivariate normal distribution.
The relationships between the individual returns can be succinctly and completely described with a specific set of parameters. Just like a univariate normal distribution is completely described by its mean and variance, the multivariate normal distribution for two random variables is completely described by their means, variances, and the correlation between the two. More generally, and using the stock returns as an example, the multivariate normal distribution for the returns on n stocks is completely defined by the following three sets of parameters.
The n means of the n series of returns (?1, ?2,..., ?n).
The n variances of the n series of returns (σ1, σ2, ...,σn).
The (n * (n - 1)) / 2 pair-wise correlations.
If there are only two random variables, n = 2, then 2 * (2 - 1) / 2 = 1 and there is only one correlation. If there are three random variables, n = 3, then 3 * (3 - 1) / 2 = 3 and there are three correlations. The existence of correlation is the feature that distinguishes a multivariate normal distribution from a univariate normal d
istribution. These correlations indicate the strength of the linear relationships between each pair of variables.
r: Define shortfall risk.
Shortfall risk is the risk that a portfolio will fall below a particular value over a given horizon and is the focus of safety-first rules.
s: Select an optimal portfolio using Roy's safety-first criterion.
Roy's safety-first criterion states that the optimal portfolio minimizes the probability that the return of the portfolio falls below some minimum acceptable level. This minimum acceptable level is called the "threshold" level.
Rp = portfolio return
RL = threshold level return
Minimize P(Rp < RL)
SFRatio = (E[Rp] - RL /σp
Example:
A college endowment is $120 million. Over the next year, the managers of the endowment plan to withdraw $1.6 million for operations and have set a minimum acceptable goal at the end of the year of $122 million for the endowment. They are considering three portfolios for the endowment with the following expected returns and variances:
Portfolio A: E(Rp) = 9%, σp = 12%
Portfolio B: E(Rp) = 11%, σp = 20%
Portfolio C: E(Rp) = 6.6%, σp = 8.2%
The threshold is RL = (1.6 + 122 - 120) / 120 = 0.030. The SFR's and corresponding F(-SFRatios) are below.
Portfolio A: SFRatio = (9 - 3) / 12 = 0.5 and F(-SFRatio) = 0.3085
Portfolio B: SFRatio = (11 - 3) / 20 = 0.4 and F(-SFRatio) = 0.3446
Portfolio C: SFRatio = (6.6 - 3) / 8.2 = 0.44 and F(-SFRatio) =
0.3300
The best choice is Portfolio A because its value for F(-SFRatio) = 0.3085 is the lowest of the three choices. Note, choosing the portfolio with the largest SFRatio gives the same result as choosing based on the smallest F(-SFRatio). Hence, you really don't have to look up -SFRatios in the normal table! Just pick the highest SFRatio.
t: Explain the relationship between the lognormal and normal distributions.
A lognormal distribution is simply the distribution of a random variable Y = eX, where X ~ N(μ, σ2) and e = 2.718, the base of the natural logarithms.
What you need to know for the exam:
The lognormal distribution is skewed to the right.
The lognormal distribution is bounded from below by zero.
The random variable Y = eX. Or alternatively, from the properties of logarithms, ln[Y] = X and X is normally distributed. To prove this, remember the following fun fact about logarithms: ln[eX] = X.
u: Distinguish between discretely and continuously compounded rates of return.
A
discretely compounded rate of return is based on the relationship of the change of an asset's price and its starting price. It is denoted in the readings as "R." If the beginning price and ending price of an asset are S0 and S1 respectively, R = S1 / S0 - 1. Note that this is the same thing as the holding period return that we calculated in Chapter 3. The only difference is that there are no periodic cash flows considered here.
A continuously compounded rate of return measures the change over the average of all numbers between S0 and S1. This gives a useful perspective on the return earned over a period. This is done with the use of the natural logarithm (ln). Given a holding period return of R, the continuously compounded rate of return is:
ln(1 + R) = ln(S1 / S).
v: Calculate a continuously compounded return, given a specific holding period return.
Example: A stock increases from $100 to $110 during a year. From a discretely compounded return perspective, we would say that the return was R = ($110 / $100) - 1 = 0.10, and this is the holding period return. Yet, if the stock falls from $110 to $100, using the same technique, the return is R = ($100 / $110) - 1 = -0.0909. This could be misleading because the stock ends at the same value it started at $100m, but the average of the discretely compounded returns is (0.1000 + [-0.0909]) / 2 = 0.00455.
A continuously compounded return measures the nominal return with respect to the average of all values between the beginning and ending value: r = ln($110 / $100) = 0.0953 and it is also true that r = ln($100 / $110) = -0.0953. The fact that the stock ends at the same value it started at and (0.0953 - 0.0953) = 0 more accurately depicts the true movement of the stock.
w: Explain Monte Carlo simulation and describe its major implications.
Monte Carlo simulation is used to find approximate solutions to complex problems. The procedure usually uses a random number generator from a computer to generate sets of realized random variables from specified distributions. The results combine to form a new series of variables for which the true distributions are too mathematically complex to define. After generating a large number of sets of realizations, the statistics of the generated numbers can be used as estimates of the true parameters of the complex distribution.
Major applications:
Projecting the interaction of pension plan assets and liabilities.
Developing estimates of value at risk.
Estimating the potential success of a given trading strategy.
Estimating the distribution of the return of a portfolio composed of assets that do not have normally distributed returns.
Estimating the distribution of an asset that has features such as embedded options, call features, and parameters that change as market conditions change.
x: Explain historical simulation and describe it
s limitations.
Historical simulation uses historical data to generate the sets of realized random variables (as opposed to a random number generator as in Monte Carlo simulation).
Limitations:
Historical simulation cannot take into account the effect of significant events that did not occur during the sample period. For example, if a particular security only came into existence after 1987, we do not have observations for its behavior during a "market crash."
It cannot perform "what if" analysis. The source of the sample data is a fixed set, and we cannot investigate the effects of changing parameters in certain ways.