Huffman Encoding is a Lossless Compression Algorithm used to compress the data. It is an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper “A Method for the Construction of Minimum-Redundancy Codes”. 
As it can be understood from being a “Compression Technique”, the aim is to encode the same data in a way that takes up less space. Accordingly, when a data is encoded with Huffman Coding, we get a unique code for each symbol in the data. For example the string “ABC” occupies 3 bytes without any compression. Let’s assume while the character A is given the code 00, the character B is given the code 01, the character C is given the code 10 as the result of encoding. To store the same data, we would only need to use 6 bits instead of 3 bytes. Before examining the working principle of Huffman Encoding, I hope what I mean by compression is better understood !
Huffman Encoding is an algorithm which uses frequency (or probability) feature of symbols and a binary tree structure. It consists of the following 3 steps:
- Probability Calculation & Ordering the Symbols
- Binary Tree Transformation
- Assigning Codes to the Symbols
Probability Calculation & Ordering the Symbols
We count the number of each symbol in the whole data, then we calculate the “probability” of each symbol by dividing that count by the total number of characters in the data. Since its an algorithm using probability, more common symbols — the symbols having higher probability — are generally represented using fewer bits than less common symbols. This is one of the advantageous sides of Huffman Encoding.
As an example, for the following data having 5 different symbols as A B C D E, we have the probabilities as shown right:
Then we easily order the symbols according to their probabilities representing each symbol as a node and call that our “collection”. Now, we are ready to pass the next step.
Binary Tree Transformation
- From the collection, we pick out the two nodes with the smallest sum of probabilities and combine them into a new tree whose root has the probability equal to that sum.
- We add the new tree back into the collection.
- We repeat this process until one tree encompassing all the input probabilities has been constructed.
Assigning Codes to Symbols
Continue reading: https://towardsdatascience.com/huffman-encoding-python-implementation-8448c3654328?source=rss—-7f60cf5620c9—4