Statistics is an important data analysis technique. The term statistics referred to the process of collection, classification, and comparison of data. Data statistics helps in deriving meaningful interpretation from raw data.
It plays an important role in handling numerical data. Statistical techniques involve different procedures like central tendency, tabulation, frequency calculation, average, dispersion and many more.
Analysis of raw data is essential for the measurement and defining interpretation of raw data. It is further used for drawing inferences, testing hypothesis, making suggestions and making recommendations. In this blog, we will read about frequency distribution, a part of statistical technique.
Before actually considering the frequency distribution, first let us look at the overview of the term statistics.
The term statistics is defined as the techniques and process of collecting, describing, analyzing and interpreting numerical data. It is the aggregate of facts that marked extent, define causes, numerically express the system equation and problem.
There are four main components of statistics that are also known as the process of statistics. These four components are defined by Croxton and Cowden:
Collection of data.
Presentation of data.
Analysis of data.
Interpretation of data.
Statistics are further classified into two branches, descriptive statistics and inferential statistics.
Descriptive statistics as the name suggests is more into description and details. It includes concluding, organizing, summarizing and presenting data.
Inferential statistics helps in making predictions out of the collected data. It includes estimation, hypothesis testing, relationship and making inference out of the data.
Statistics is an important process for serving following purposes:
To effectively conduct research.
For accessing numerical information.
For reading the journals and other data easily.
For developing critical and analytic research skills.
For quick decision making.
Biostatics is a branch of statistics. The term biostatics is used when the tool of statistics is applied to a biological science such as drug, medicines or clinical research. Biostatics is also known as mathematical biology, it is a fast growing, well designed and a recognized subject.
Biostatics has been widely applied to find the relative potency of a new drug, to compare the efficiency of a particular drug, to find association between two medical attributes and to identify the signs and symptoms of the diseases.
Frequency distribution is defined as the first method that is used to organize data in an effective way. Frequency distribution performs the systematic investigation of the raw data. The data is first arranged by frequency distribution and then set as frequency table.
Frequency distribution is defined as the systematic representation of different values of variables along with the corresponding frequencies; it is classified on the basis of class interval.
Class interval is defined as the size of each class into which a range of variables is divided and represented as histogram or bar graph.
Class intervals are divided into two different categories, exclusive and inclusive class intervals. Here is the example to both:
The class interval where the upper limit of previous data entry is the same as the lower limit of next data entry is called an exclusive data interval. For consideration,
S. No |
Marks |
No. of students |
1 |
0-20 |
8 |
2 |
20-40 |
7 |
3 |
40-60 |
3 |
The class interval where the upper limit of previous data entry is the same as the lower limit of next data entry is called an exclusive data interval. For consideration,
S. No |
Marks |
Number of students |
1 |
1-20 |
7 |
2 |
21-40 |
9 |
3 |
41-60 |
8 |
Also Read | Introduction to Bayesian Statistics
Frequency distribution is further classified into two types based upon class interval. Named as discrete frequency table and continuous frequency table. Here are the examples:
If the class interval of data is not given, it is termed as a discrete frequency distribution. For example,
S. no. |
Number of items |
Number of packets |
1 |
1 |
23 |
2 |
2 |
12 |
3 |
3 |
34 |
4 |
4 |
20 |
5 |
5 |
72 |
|
Total |
163 |
When the class intervals are available within the data, it is called a continuous frequency distribution. For consideration,
S. No |
Marks |
Number of students |
1 |
0-10 |
5 |
2 |
20-30 |
7 |
3 |
30-40 |
12 |
4 |
40-50 |
32 |
5 |
50-60 |
4 |
|
Total |
60 |
Also Read | Data Democratization
There are two types of frequency distribution methods:
Grouped frequency distribution.
Ungrouped frequency distribution.
As the name suggests, grouped frequency distribution is well defined and distributed into groups. When the variables are continuous the data is gathered as grouped frequency distribution. Different measures are taken during data collection, such as age, salary, etc. The entire data is classified into class intervals. For consideration,
Family Income |
Number of persons |
Below-20,000 |
52 |
20,001-30,000 |
14 |
30,001-40,000 |
6 |
40,001-50,000 |
8 |
As the name suggests, ungrouped frequency distribution doesn’t consist of well-distributed class intervals. Ungrouped frequency distribution is applied on discrete data rather than continuous one. Examples of such data usually include data related to gender, marital status, medical data etc. For consideration,
Variable |
Number of persons |
GENDER |
|
Female |
19 |
Male |
22 |
MARITAL STATUS |
|
Single |
32 |
Married |
4 |
Divorced |
4 |
Cumulative frequency distribution is also known as percentage frequency distribution. Percentage distribution reflects the percentage of samples whose scores fall in the specific group and number of scores.
This type of distribution is quite useful for comparison of data with the findings of other studies having different sample sizes. In this type of distribution, percentages and frequencies are summed up in a single table. For consideration,
Score |
Frequency |
Percentage |
Cumulative frequency |
Cumulative percentage |
1 |
4 |
8 |
4 |
8 |
2 |
14 |
28 |
32 |
64 |
4 |
6 |
12 |
10 |
20 |
5 |
8 |
16 |
18 |
36 |
7 |
8 |
16 |
40 |
80 |
8 |
6 |
12 |
46 |
92 |
9 |
4 |
8 |
50 |
100 |
Bivariate frequency distribution is a frequency distribution where the number of variables is fixed to two. Bivariate distribution has two marginal distributions. For consideration,
Age |
Salary per month |
20-30 |
15 |
30-40 |
5 |
40-50 |
7 |
Total |
27 |
As the name suggests, multivariate frequency distribution is the frequency distribution where there are more than two variables in the frequency distribution table.
Also Read | Data Cleaning Tools
Graphical representation of frequency distribution
Data representation is the next step of data gathering. The data gathered and maintained by frequency distribution is then represented in different forms of figures and graphs. Important forms of frequency distribution graphs are as follows:
Histogram
Line frequency graph
Frequency polygon
Frequency curve
Here is the brief introduction to all of them:
Line frequency graph is the graphical representation of data in the form of lines. This graph is used to depict discrete data. The data is represented on the x-axis and frequencies are represented on the y-axis of the graph. The length of lines is drawn as per the sizes of frequency distribution.
Histogram is the representation of frequency distribution of data. The data is represented in the form of rectangular bars starting right from the origin. The classes are represented on the x-axis and frequencies on the y-axis.
There exist four types of histograms:
A graph that has more than four sides is known as a polygon. Frequency polygon is basically defined as a curve that is obtained by joining the mid-points of the top of rectangles of the graph by a straight line. Like other graphs, variables are taken on the x-axis and frequencies on the y-axis.
Frequency curve is defined as a smooth curve obtained by joining the top point of frequency polygon by a free hand curve.
Cumulative frequency curve is also known as ogive. It is the cumulative frequency graph that is plotted corresponding to the upper limits of the classes. The cumulative frequency of each upper limit of the classes is joined by a free hand curve.
Ogive is further divided into two types:
Also Read | What is Vital Statistics?
When we talk about statistics, we just can’t escape the term central tendency. Central tendency is used to represent the whole data series. It refers to the average of data series.
Measurement of central tendency is used to measure the central value around the data concentration. It is defined as an attempt to find one single figure to describe the whole data. By calculating central tendency, we can find a particular value to represent the whole of data.
The main purposes for measuring central tendency are as follows:
1. For comparing two data quantities.
2. To derive a quantitative relationship between different group averages.
3. For quicker decision making.
4. For obtaining a single value for the entire data series.
The central tendency of data is measured in three terms, mean, mode and median.
Mean of the data also known as arithmetic mean of the data or mathematical average of the data. The arithmetic mean of data is defined as the sum of all the dividends of data divided by the total number of dividends. The mathematical average is further classified into following types:
Simple arithmetic mean
Weighted arithmetic mean
Geometric mean
Harmonic mean
For instance,
For the set of data where height of five students is given as 160, 162, 175, 158, and 166.
Arithmetic mean= sum of dividends/number of dividends.
= (160+162+175+158+166)/5
=164.2
Median is defined as the set of values when the set of data is arranged in either ascending or descending order of magnitude. Median is also defined as the positional average. For consideration, for the given data set 168, 173, 153, 163, and 158.
Ascending order of data, 153, 158, 163, 168, and 173.
Median of the data is obtained as 163 cm.
Mode is defined as the value which has the highest frequency. In other words, the item that has occurred the maximum number of times is called the mode of data. For example, consider the following data:
Height (in cm) |
145 |
160 |
165 |
168 |
170 |
No. of students |
3 |
16 |
8 |
20 |
6 |
For the given data, the mode is 160 cm as maximum observations have this height.
Also Read | Mean Median and Mode
This is all the basics of statistics that you need to know about statistics. Statistics is an important part of data management. For the well purpose of organizations and project management, statistics is an essential element for all spheres.
5 Factors Influencing Consumer Behavior
READ MOREElasticity of Demand and its Types
READ MOREAn Overview of Descriptive Analysis
READ MOREWhat is PESTLE Analysis? Everything you need to know about it
READ MOREWhat is Managerial Economics? Definition, Types, Nature, Principles, and Scope
READ MORE5 Factors Affecting the Price Elasticity of Demand (PED)
READ MORE6 Major Branches of Artificial Intelligence (AI)
READ MOREScope of Managerial Economics
READ MOREDijkstra’s Algorithm: The Shortest Path Algorithm
READ MOREDifferent Types of Research Methods
READ MORE
Latest Comments
elenalacefield
May 06, 2022On a clear score, my credit was less than 463, and I was slightly skeptical but decided to give this an opportunity. R E M O T E R E P A I R a t C L E R K d o t C O M has done more than fix my credit. He has helped me better my knowledge of credit opportunities. I will refer him to any of my friends or family with any bit of credit discrepancies. I am now ready to continue improving my score and not worry about getting approved for the things I want and need in life