Characteristics of Data - Central Tendency and Dispersion
Converting Data to Information: The goal of a six sigma project is not to produce an overwhelming amount of data that ends up intimidating the concerned people. The goal is to find out as much data as possible and convert it into meaningful information that can be used by the concerned personnel to make meaningful decisions about the process. However for that one needs to learn how to statistically deal with huge amounts of data.
Data primarily needs to be understood for its two characteristics viz central tendency and dispersion. Data tends to be centred around a point known as average. The degree to which it is spread out from that point is also important because it has an important bearing on the probability. It is for this reason that we use the following characteristics to make sense of the data involved:
Measures of Central Tendency: Different types of data need different measures of central tendency. Some of the important measures, commonly used are as follows:
- Mean: This is most probably the arithmetic mean or simply the average of the data points involved. It could also be the geometric or harmonic mean however that is unusual. This is the most popular measure of central tendency. Many statistical techniques have evolved that use the mean as the primary measure to understand the centrality of a given set of data points.
- Median: If all the data points given in a particular data set were arranged in ascending or descending order, the value in the centre is called the median. In case where data sets have an odd number of elements like 7, the median is the 4th item because it has 3 data points on each side. In case the number is even like 8, then the median is the average of 4th and 5th data point. Median is used where there are outliers i.e. big numbers that impact the mean giving a false picture of the data involved.
- Mode: This is the value of the most frequently occurring item in the data set. This is the value of the most expected number to occur.
Measures of Dispersion: The degree of spread determines the probability and the level of confidence that one can have on the results obtained from the measures of central tendency. Common measures of dispersion are as follows:
- Range: The two endpoints between which all the values of a data set fall is called a range. It is important because it exhaustively includes all the possibilities.
- Quartiles: The data set is divided into 4 sets and the number of elements is each set is studied to give us data about quartiles. Similar measures include the deciles and the percentiles. However quartiles remain most widely used.
- Standard Deviation: A complex formula is used to work out standard deviation of a given set of data. However standard deviation is like the mean, it is the most important measure of dispersion and is used exhaustively in almost every statistical technique.