Statistics demystified pdf free download






















Four StarsBy Dr Reacute;agan Lorraine Lavoratahelped with my research, im a prof and doctoral student1 of 1 people found the following review helpful. This is a great bookBy Valerie ThompsonI like this book, because I was able to utilize it as a referencAbout the AuthorStan Gibilisco has authored or co-authored more than 50 nonfiction books in the fields of electronics, general science, mathematics, and computing.

Here's your solution. This unique self-teaching guide offers problems at the end of each chapter and part to help expose potential weaknesses, and a question "final exam" to reinforce what you learned. If you want to build or refresh your understanding of statistics, here's a fast and entertaining self-teaching course that's specially designed to reduce anxiety.

Let it be your direct route to learning or brushing up on this essential topic. Save my name, email, and website in this browser for the next time I comment.

This site uses Akismet to reduce spam. Learn how your comment data is processed. Delivered by Google-FeedBurner. All rights reserved. We participate in various affiliate marketing programs, which means we get paid commissions on purchases made through our links to retailer sites.

Request: pharmstudentme[at]gmail. Suggestions for future editions are welcome. The output for this example is as follows. The mean for each sample is shown by a line within the dot plot. The data in this table is more variable than the data in Table Example The output for this example with the new data is as follows. Figure is a dot plot of the data in Table Note that the four sample standard deviations are 2. They are small when compared with the sample standard deviations in the second example: This causes the error sums of squares to increase from The patients are randomly divided into three groups.

One group is treated with a diet that is very restrictive, another group is treated with a strict exercise program, and the third serves as a control group. The response variable is the change in diastolic blood pressure after six months of treatment. In this highly unlikely example, there is no variation within the three groups Table Table An experiment with no variation within treatments.

Diet Exercise Control 10 13 2 10 13 2 10 13 2 10 13 2 10 13 2 The error term in the analysis of variance has zero sums of squares. All the variation is between the three treatments. There is no variation within treatments.

The error term is zero. In other words, there is no variation within the treatment groups. The treatments have been randomly assigned to experimental units within the blocks.

Assumptions: 1 The probability distributions of observations corresponding to all block—treatment combinations are normal. Five golfers of varying ability each hit the three brands in random order.

The letters C, B, and A are randomly pulled out of a hat in that order. Jones will hit brand C, followed by brand B, followed by brand A. Continuing in this manner will ensure the random assignment of treatments within blocks.

The three brands of balls are the treatments and the five golfers are the blocks. The distance that each ball travels is the response variable.

Table Statistical layout showing treatments and blocks. The output shown below is given by Minitab. Block 1 2 3 4 5 Mean Enter the data into the work sheet and execute the pull-down sequence Tools Data Analysis.

The output is given in Fig. The Excel output is the same as the Minitab output. The only thing that Excel gives that Minitab does not is the critical F-value for blocks, 3. Assumptions: 1 The distribution of the response is normally distributed.

Factor A is diabetes. None of the twenty were on medication for high blood pressure. The diastolic blood pressure of the twenty participants is measured and the results are given in Table We are interested in the interaction of weight and diabetes Table Normal weight Overweight Non-diabetic 75, 80, 83, 85, 65 85, 80, 90, 95, 88 Diabetic 85, 90, 95, 90, 86 90, 95, , , on the blood pressure.

If there is no significant interaction, then we are interested in the effect of diabetes on blood pressure and in the effect of weight on blood pressure.

We refer to this as a 2 by 2-factorial experiment with 5 replicates whose response variable is diastolic blood pressure. The dialog box is shown in Fig. The interaction plot is shown in Fig. In Fig. The response for both diabetics and non-diabetics shows an increase in diastolic blood pressure when the weight level changes from normal weight to overweight.

The fact that the lines are nearly parallel indicates there is no interaction. The dialog box and interaction plot are shown in Figs. Means FactorA 1 2 N 10 10 Diastoli FactorB 1 2 N 10 10 Diastoli Similarly, the low level of weight had a mean of The company is faced with the problem of advertising the new camera. One factor deals with what advertising approach to emphasize.

The price and the quality of pictures are the two levels of advertising approach the company decides to use. The other factor of interest is the advertising medium to use. The levels of advertising medium that the company will use are radio, newspaper, and Internet.

The response variable is the number of weekly sales. The data are shown in Table We notice first that the interaction is significant. Thus our objective is to explain the nature of the interaction. In doing this we will discover what the experiment has really found about what, and how sales are affected. Look at the results from all angles. The interaction plots are given in Figs. Radio sales are relatively low and are the same for both the price and the quality approach.

For newspaper advertising sales are higher for the price approach than for the quality approach. The sales are greater for quality approach than for price approach for Internet advertising. The greatest sales are for Internet advertising where the quality approach is used. When the quality approach is used, the Internet approach to advertising is the best.

First, the data is entered into the worksheet as shown in Fig. Look at Fig. The following Excel output, shown in Figs. The output in Fig. For example, suppose we wanted to investigate the effect of three factors on the amount of dirt removed from a standard load of clothes.

The three factors are brand of laundry detergent, A, water temperature, B, and type of detergent, C. The two levels of brand of detergent are brand X and brand Y. The two levels of water temperature are warm and hot. The factorial design that applies to this experiment is called a 23 factorial design. There are eight treatments possible in a 23 design.

They are shown in Table Treatment Detergent Water temp. This would require 16 standard loads of clothes. The expressions for the sums of squares are omitted. Eight standard loads were randomly assigned to the eight treatments. This experiment was then replicated so that two observations for each treatment were obtained. The steps to follow when using Minitab are shown in Figs. Table Data for 23 experiment. Detergent Water temp. Figure indicates that no interaction is present, since the lines are nearly parallel in all three graphs.

The means at the low and high levels of the factors are as follows. Means Brand 1 2 N 8 8 Response Temp 1 2 N 8 8 Response That is, Brand Y removes 0. That is, 5.

That is, liquid detergent on average removes 4. The brand of detergent X or Y is not as important as the temperature and the type of detergent. Using a hot temperature and a liquid detergent would be recommended. There are no Excel routines for three or more factors, but there are Minitab routines for any number of factors. The number of experimental units required for experiments with a large number of factors becomes very large.

For example, a 24 factorial experiment with two replications requires 32 experimental units. Topics involving large numbers of factors are beyond the scope of this book. Assumptions: Vary, depending on the method or procedure used. To illustrate, suppose an analysis of variance has led to the conclusion that, of four means, not all are equal. Or, we might be interested in the following, for example: 1 Comparing the average of treatments 1, 2, and 3 with the average of treatment 4.

Some require equal sample sizes, while some do not. The choice of a multiple comparison procedure used with an ANOVA will depend on the type of experimental design used and the comparisons of interest to the analyst.

One method is the traditional chalk-and-blackboard method, referred to as treatment 1. A second method utilizes Excel weekly in the teaching of algebra and is called treatment 2. A third method utilizes the software package Maple weekly and is called treatment 3.

A fourth method utilizes both Maple and Excel weekly and is called treatment 4. Sixty students are randomly divided into four groups and the experiment is carried out over a one semester time period.

The response variable is the score made on a common comprehensive final in the course. The scores made on the final are shown in Table Fill in the dialog box as shown. Click comparisons. This brings up a new dialog box, shown in Fig. Fill in the One-way Multiple Comparisons dialog box as shown.

Treatment 3 5 2 4 1 ——————————————————————— —————————————————— ————————————— There are 10 pairs that are compared. The results are as follows. Treatment mean 3 is less than treatment mean 4, treatment mean 3 is less than treatment CHAPTER 2 Analysis of Variance mean 1, treatment mean 5 is less than treatment mean 1, treatment mean 2 is less than treatment mean 1.

Forty individuals are selected and paid to participate in the experiment. Ten are randomly assigned to each of four groups. The time required by each person in each group is recorded. The recorded data is the time in hours required to complete the form. Form 1 Form 2 Form 3 Current form 3. Give the dot plot and box plot comparisons of the four means.

Refer to Exercise 1 of this chapter. Suppose in exercise 1 of this chapter that a block design was used. Form 1 Form 2 Form 3 Current Group 1 5. Which form would you recommend that the state choose? Their cumulative GPA was the response recorded. The results of the study are shown in Table Interpret the results of the experiment.

A study was undertaken to determine what combination of products maximized the score that a pizza received. Factor A was cheese and the levels were small and large, factor B was meat and the levels were small and large, and factor C was crust and the levels were thin and thick.

The data are given in Table 0 is low and 1 is high. Cheese Meat Crust Rep 1 Rep 2 0 0 0 5. What is your general recommendation? Table 7. Table Source of variation Treatments 8. Degrees of freedom 5 Blocks — Error 5 Total 15 Sum of squares Mean squares F-statistic p-value — — — — — — — 50 A 22 factorial has been replicated 5 times in a completely randomized design. A 23 factorial has been replicated 3 times in a completely randomized design.

Compare all 15 pairs of means a pair at a time. There are n1 elements from population 1, n2 elements from population 2,. They are so named because they allow us to determine the value of the dependent value from the values of the independent variables.

These deterministic models are usually from the natural sciences. Some examples of deterministic models are as follows:. Probabilistic models are more realistic for most real-world situations. We know for example that the cost of twenty homes, each of square feet, would likely vary. The actual costs might be given by the twenty costs in Table This is equivalent to assuming that the mean value of y, E y , equals the deterministic component.

Fitting this model to a data set is an example of regression modeling or regression analysis. The height x is in centimeters and the weight y is in kilograms. The error component is normally distributed with mean equal to 0 and standard deviation 0.

Note that the population relationship is usually not known, but we are assuming it is known here to develop the concepts. In fact, we are usually trying to establish the relationship between y and x.

We capture ten of these rodents and determine their heights and weights. This data is given in Table , and a plot is shown in Fig. Height, x Weight, y 1 The actual captured rodents have weights that vary about the line of means. Also note that the taller the rodent, the heavier it is. As mentioned earlier, we do not usually know the equation of the deterministic line that connects y with x.

What we shall see in the next section is that we can sample the population and gather a set of data such as that shown in the height—weight table above and estimate the deterministic equation. The assumptions of regression are: 1 Normality of error. The error terms are assumed to be normally distributed with a mean of zero for each value of x. This means that the errors vary by the same amount for small x as for large x.

Assumptions: The assumptions of regression. The relationship between the number of hours studied, x, and the score, y, made on a mathematics test is postulated to be linear. Ten students are sampled and the scores and hours studied are recorded as in Table A scatter plot Fig. The pull-down is Graph Scatterplot. The scatter plot shows a clear linear trend. The data for x and y are entered into columns C1 and C2 of the Minitab worksheet. Since 0 is outside the range of hours studied, it does not have an interpretation in the context of the scores.

Table compares the observed and predicted values. The values Fig. Assuming the trend continues into , predict the US sales for Suppose we code the years as 1, 2, 3, 4, 5, and 6. Note that the slope of this line is negative since, as the years increase, the sales are decreasing. It is estimated that as the year increases by one the sales decrease by 11 million.

The coded years and sales are entered into columns A and B. Figure gives the output for Excel. If the null hypothesis is not rejected, then a straight line does not model the relationship between x and y.

The slope of the model tells you how y changes with a unit change in x. A plot of systolic blood pressure versus weight is shown as a Minitab output in Fig.

The test statistic is computed as follows. The data would refute the null hypothesis. Each additional pound would increase the systolic pressure by less than 1. Note: The T value 3. Assumptions: The random variables X and Y have a bivariate distribution. The following Minitab output results. The plot of the data is shown in Fig.

The correlation dialog box is filled in as in Fig. The following output is obtained. The following example illustrates two variables that are negatively correlated. The sample consists of fifteen high school freshers. The plot shows the negative linear relationship between the two variables. The value of r indicates the strength of the relationship.

Table Source d. Also, the source explained variation is also called regression variation and unexplained variation is also called residual variation. The data is shown in Table The coefficient of determination from the Excel worksheet is shown as R Square 0. The interpretation Fig. For example, consider a regression study where y represents systolic blood pressure and x represents weight. Now suppose we wish to use the regression equation to estimate the expected value of the systolic blood pressure of all individuals who weigh pounds.

We would predict that an individual who weighs pounds would have a systolic blood pressure equal to Likewise we would estimate the expected systolic blood pressure of all individuals who weigh pounds to be This additional 1 makes the prediction interval wider. The independent variable x was the hemoglobin A1C value, taken after three months of taking the fasting blood glucose value each morning of the three-month period and averaging the values.

The latter value was the dependent variable value y. The data were as shown in Table The data are entered in the Minitab worksheet as shown in Fig. Choosing the options box in Fig. Enter the data into the Excel worksheet. Select Regression in order to perform a regression analysis. Fill out the Regression dialog box as shown in Fig.

This will produce the output given in Fig. Figure shows the calculations using the Excel worksheet. The lower and upper prediction intervals are shown in rows 3 and 4. The values needed are shown in row 1. This computation is also performed when you use Minitab. Give the deterministic equation for the line passing through the following pairs of points: a 1, 1. Give the slope and y-intercept of the deterministic equations in problem 1. Table 4. Table x y 1.

The systolic blood pressure, y, and the number of drinks per week, x, were recorded for a group of ten patients with poorly controlled high blood pressure.



0コメント

  • 1000 / 1000