5 Rules to Construct Frequency Distribution




Tabular organization of data showing the distribution of data in classes or groups, along with the number of observations in each class or group, is called a frequency distribution. The class frequency refers to the number of observations in a particular class. Frequency distributions is a powerful statistical tools which frequently used for descriptive and predictive analytics.

The following are some five fundamental roles that should be kept in mind when constructing a grouped frequency distribution.

1. Number of Classes

The number of classes pretty much depends on the size of the data. In statistics, it is a common practice to keep the number of classes between 5 and 20. Too many classes will kill the purpose of data condensation into meaningful groups. At the same time, too few classes will result in a loss of information. Therefore, we always need to strike an appropriate balance.

2. Range of Variable

It is vital to determine the range of variable data by taking the difference between the largest and the smallest values in the data. The range of a variable allows us to pick up the correct number of classes.

3. Class Interval - Divide Range by Number of Classes

To determine the approximate width or class interval, divide range (from step 2) by the number of classes and round to next higher whole number. The result of the division will give us equal class-interval. If equal class-intervals are inconvenient or maybe undesirable, then classes of unequal size are used. But in practice, intervals that are multiples of 5 or 10, are commonly used as people can understand them easily.

4. Determine Class Limits

The lowest class usually starts with the smallest data value or a number less than it. It is better if it is a multiple of class-interval. Find the upper-class boundary by adding the width of the class-interval to the lower class-boundary and write down the upper-class limits too. The open-end classes, i.e., classes with the lowermost or uppermost class boundary unknown, should be avoided if possible.
By adding the class-interval repeatedly, you should determine the remaining class-limits and class boundaries. We should place the lowest class at the top, and the rest should follow according to size. In some cases, we may prefer to put the highest class at the top.

5. Distribute Data into Classes:

The best way to distribute the data into the appropriate classes is by using a “Tally-Column” where values are tabulated against suitable classes by merely making short bars or tally marks to represent them. It is customary for convenience in counting to place the first four bars vertically and the fifth one diagonally and to leave a space. Then we write the number of tallies in the frequency column. We usually omit the tally column in the final presentation of the frequency distribution. But in case of a small number of values, the actual values should be shown against each class to mitigate the chances of error.
Finally, we need to total the frequency column to validate that all the data.

Note: We apply these rules to raw group data, which are assumed to be continuous. In the case of discrete data that carry only integral values, the concept of a class boundary is unrealistic as there can be no points where the adjoining classes meet. Despite this logical difficulty, when the discrete data are sufficiently large, they are treated for convenience of calculations as continuous. They hence are grouped in the same way as the continuous data.

Looking for Machine Learning Partner?