Introduction to Statistical Data Analysis

Statistics is basically a science that involves data collection, data interpretation and finally, data validation. Statistical data analysis is a procedure of performing various statistical operations. It is a kind of quantitative research, which seeks to quantify the data, and typically, applies some form of statistical analysis. Quantitative data basically involves descriptive data, such as survey data and observational data.

Statistical data analysis generally involves some form of statistical tools, which a layman cannot perform without having any statistical knowledge. There are various software packages to perform statistical data analysis. This software includes Statistical Package for the Social Sciences (SPSS), Stata soft, etc.

Data Analysis Using STATA: Use and Interpretation

Data Analysis Using IBM SPSS: Use and Interpretation

Introduction to IBM AMOS SPSS

Introduction to SmartPLS software

Introduction to Eviews

Data in statistical data analysis consists of variable(s). Sometimes the data is univariate or multivariate. Depending upon the number of variables, the researcher performs different statistical techniques.

Main contentsSee more from basic to advanced

Statistical Inference

Many situations require information about a large group of elements (individuals, companies, voters, households, products, customers, and so on). But, because of time, cost, and

Descriptive Statistics

Most of the statistical information in the media, company reports, and other publications consists of data that are summarized and presented in a form that

Data Sources

Data can be obtained from existing sources, by conducting an observational study, or by conducting an experiment. 1. Existing Sources In some cases, data needed

Summarizing Data for a Quantitative Variable

As defined in Section 2.1, a frequency distribution is a tabular summary of data showing the number (frequency) of observations in each of several nonoverlapping

Summarizing Data for a Categorical Variable

1. Frequency Distribution We begin the discussion of how tabular and graphical displays can be used to summarize categorical data with the definition of a

Ethical Guidelines for Statistical Practice

Ethical behavior is something we should strive for in all that we do. Ethical issues arise in statistics because of the important role statistics plays

Computers and Statistical Analysis

Statisticians use computer software to perform statistical computations and analyses. For example, computing the average time until recharge for the 200 batteries in the Rogers

Big Data and Data Mining

With the aid of magnetic card readers, bar code scanners, and point-of-sale terminals, most organizations obtain large amounts of data on a daily basis. And,

Data Analytics

Because of the dramatic increase in available data, more cost-effective data storage, faster computer processing, and recognition by managers that data can be extremely valuable

Measures of Distribution Shape, Relative Location, and Detecting Outliers

We have described several measures of location and variability for data. In addition, it is often important to have a measure of the shape of

Measures of Variability

In addition to measures of location, it is often desirable to consider measures of variability, or dispersion. For example, suppose that you are a purchasing

Measures of Location

1. Mean Perhaps the most important measure of location is the mean, or average value, for a variable. The mean provides a measure of central

Data Visualization: Best Practices in Creating Effective Graphical Displays

Data visualization is a term used to describe the use of graphical displays to summarize and present information about a data set. The goal of

Summarizing Data for Two Variables Using Graphical Displays

In the previous section we showed how a crosstabulation can be used to summarize the data for two variables and help reveal the relationship between

Summarizing Data for Two Variables Using Tables

Thus far in this chapter, we have focused on using tabular and graphical displays to summarize the data for a single categorical or quantitative variable.

Some Basic Relationships of Probability

Given an event A, the complement of A is defined to be the event consisting of all sample points that are not in A. The

Events and Their Probabilities

In the introduction to this chapter we used the term event much as it would be used in everyday language. Then, in Section 4.1 we

Random Experiments, Counting Rules, and Assigning Probabilities

In discussing probability, we deal with experiments that have the following characteristics: The experimental outcomes are well defined, and in many cases can even be

Data Dashboards: Adding Numerical Measures to Improve Effectiveness

In Section 2.5 we provided an introduction to data visualization, a term used to describe the use of graphical displays to summarize and present information

Measures of Association Between Two Variables

Thus far we have examined numerical methods used to summarize the data for one variable at a time. Often a manager or decision maker is

Five-Number Summaries and Boxplots

Summary statistics and easy-to-draw graphs based on summary statistics can be used to quickly summarize large quantities of data. In this section we show how

Hypergeometric Probability Distribution

The hypergeometric probability distribution is closely related to the binomial distribution. The two probability distributions differ in two key ways. With the hypergeometric distribution, the

Poisson Probability Distribution

In this section we consider a discrete random variable that is often useful in estimating the number of occurrences over a specified interval of time

Binomial Probability Distribution

The binomial probability distribution is a discrete probability distribution that has many applications. It is associated with a multiple-step experiment that we call the binomial

Bivariate Distributions, Covariance, and Financial Portfolios

A probability distribution involving two random variables is called a bivariate probability distribution. In discussing bivariate probability distributions, it is useful to think of a

Expected Value and Variance

The expected value, or mean, of a random variable is a measure of the central location for the random variable. The formula for the expected

Developing Discrete Probability Distributions

The probability distribution for a random variable describes how probabilities are distributed over the values of the random variable. For a discrete random variable x,

Random Variables

A random variable provides a means for describing experimental outcomes using numer- and its associated experimen- ical values. Random variables must assume numerical values. In

Bayes’ Theorem

In the discussion of conditional probability, we indicated that revising probabilities when new information is obtained is an important phase of probability analysis. Often, we

Conditional Probability

Often, the probability of an event is influenced by whether a related event already occurred. Suppose we have an event A with probability P(A). If

Sampling Distribution of x

In the previous section we said that the sample mean X is a random variable and its probability distribution is called the sampling distribution of

Introduction to Sampling Distributions

In the preceding section we said that the sample mean X is the point estimator of the population mean m, and the sample proportion p

Point Estimation

Now that we have described how to select a simple random sample, let us return to the EAI problem. A simple random sample of 30

Selecting a Sample

In this section we describe how to select a sample. We first describe how to sample from a finite population and then describe how to

The Electronics Associates Sampling Problem

The director of personnel for Electronics Associates, Inc. (EAI), has been assigned the task of developing a profile of the company’s 2500 managers. The characteristics

Exponential Probability Distribution

The exponential probability distribution may be used for random variables such as the time between arrivals at a hospital emergency room, the time required to

Normal Approximation of Binomial Probabilities

In Section 5.5 we presented the discrete binomial distribution. Recall that a binomial experiment consists of a sequence of n identical independent trials with each

Normal Probability Distribution

The most commonly used probability distribution for describing a continuous random variable is the normal probability distribution. The normal distribution has been used in a

Uniform Probability Distribution

Consider the random variable x representing the flight time of an airplane traveling from Chicago to New York. Suppose the flight time can be any

Big Data and Confidence Intervals

We have seen that confidence intervals are powerful tools for making inferences about population parameters. We now consider the ramifications of big data on confidences

Population Proportion

In the introduction to this chapter we said that the general form of an interval estimate of a population proportion p is The sampling distribution

Determining the Sample Size

In providing practical advice in the two preceding sections, we commented on the role of the sample size in providing good approximate confidence intervals when

Population Mean: s Unknown

When developing an interval estimate of a population mean we usually do not have a good estimate of the population standard deviation either. In these

Population Mean: σ Known

In order to develop an interval estimate of a population mean, either the population standard deviation s or the sample standard deviation 5 must be

Big Data and Standard Errors of Sampling Distributions

The purpose of statistical inference is to use sample data to quickly and inexpensively gain insight into some characteristic of a population. Therefore, it is

Other Sampling Methods

We described simple random sampling as a procedure for sampling from a finite population and discussed the properties of the sampling distributions of X and

Properties of Point Estimators

In this chapter we showed how sample statistics such as a sample mean X, a sample standard deviation 5, and a sample proportion p can

Sampling Distribution of p

The sample proportion p is the point estimator of the population proportion p. The formula for computing the sample proportion is where x = the

Big Data and Hypothesis Testing

We have seen that interval estimates of the population mean m and the population proportion p narrow as the sample size increases. This occurs because

Determining the Sample Size for a Hypothesis Test About a Population Mean

Assume that a hypothesis test is to be conducted about the value of a population mean. The level of significance specified by the user determines

Calculating the Probability of Type II Errors

In this section we show how to calculate the probability of making a Type II error for a hypothesis test about a population mean. We

Hypothesis Testing and Decision Making

In the previous sections of this chapter we have illustrated hypothesis testing applications that are considered significance tests. After formulating the null and alternative hypotheses,

Population Proportion

In this section we show how to conduct a hypothesis test about a population proportion p. Using p0 to denote the hypothesized value for the

Population Mean: s Unknown

In this section we describe how to conduct hypothesis tests about a population mean for the σ unknown case. Because the σ unknown case corresponds

Population Mean: σ Known

In Chapter 8 we said that the σ known case corresponds to applications in which historical data and/or other information are available that enable us

Type I and Type II Errors

The null and alternative hypotheses are competing statements about the population. Either the null hypothesis H0 is true or the alternative hypothesis Ha is true,

Developing Null and Alternative Hypotheses

It is not always obvious how the null and alternative hypotheses should be formulated. Care must be taken to structure the hypotheses appropriately so that

Home

See basic to advanced

If the data in statistical data analysis is multiple in numbers, then several multivariates can be performed. These are factor statistical data analysis, discriminant statistical data analysis, etc. Similarly, if the data is singular in number, then the univariate statistical data analysis is performed. This includes t test for significance, z test, f test, ANOVA one way, etc.

The data in statistical data analysis is basically of 2 types, namely, continuous data and discreet data. The continuous data is the one that cannot be counted. For example, intensity of a light can be measured but cannot be counted. The discreet data is the one that can be counted. For example, the number of bulbs can be counted.

The continuous data in statistical data analysis is distributed under continuous distribution function, which can also be called the probability density function, or simply pdf.

The discreet data in statistical data analysis is distributed under discreet distribution function, which can also be called the probability mass function or simple pmf.

We use the word ‘density’ in continuous data of statistical data analysis because density cannot be counted, but can be measured. We use the word ‘mass’ in discreet data of statistical data analysis because mass cannot be counted.

There are various pdf’s and pmf’s in statistical data analysis. For example, Poisson distribution is the commonly known pmf, and normal distribution is the commonly known pdf.

These distributions in statistical data analysis help us to understand which data falls under which distribution. If the data is about the intensity of a bulb, then the data would be falling in Poisson distribution.

There is a major task in statistical data analysis, which comprises of statistical inference. The statistical inference is mainly comprised of two parts: estimation and tests of hypothesis.

Estimation in statistical data analysis mainly involves parametric data—the data that consists of parameters. On the other hand, tests of hypothesis in statistical data analysis mainly involve non parametric data— the data that consists of no parameters.