Have you heard about Run Length Encoding ( RLE ) before ? My guess would be YES, that is why you are here ( DUH! ). But have you actually struggled understanding how the Run Length Encoding works in actual code ? But fear not, I have written this tutorialish article to implement the Run Length Encoding in Java.
Quick Navigation
What is Run Length Encoding (RLE) ?
Let us start with the basics first without actually diving into the code part. Let us understand what is RLE and how does it work.
RLE is a form of lossless data compression. That stores the consecutive occurrence of the same element. But this method in itself isn’t quite useful but this provides the basic understanding into complex algorithms of compression.
Let me try to explain RLE with an example. Imagine we have the following text
JJJJJJJJJJJJJJJJGGGGGGGGGGGGGGG@@@@@@@@@@@@@@@@@@@@@
Let us see what the output from the algorithm should look like :
J16G15@21
Code language: CSS (css)
That is what data compression looks like. And it is very naive approach to lossless encoding. The resulting data will not be directly usable. It will have to be converted back to the original form before being used.
Before diving right into the code let us understand how our algorithm will work.
Also Read : 3 Essential Datastructures in Object Oriented Programming
TECHENUM
What are the uses of RLE ?
Here let us first look at how the RLE might be used in computing. There are multiple use cases of RLE such as :
- Image compression
- Video compression
- File compression
- Audio compression, etc
The things I have mentioned above are just sample use-cases. RLE might not be directly used in those scenarios but the algorithm derived for compression will have something to do with RLE. As it is one of the very naive way of encoding data.
The use of RLE was practical almost 60 years ago when the data for television was transmitted using this approach. But in today’s world RLE alone will not be as useful.
We have grown a lot since then. And data and technology has evolved tremendously. We had to find other ways but to understand compression this might be a good start.
Also Read : Interface in OOP: Guide to Polymorphism and Callbacks
TECHENUM
How does RLE work ?
Before we implement the Run Length Encoding in Java. Le us see what the algorithm does. Our input will look something like :
JJJJJJJJJJJJJJJJGGGGGGGGGGGGGGG@@@@@@@@@@@@@@@@@@@@@
Now we will be describing our algorithm based on the string value above
- Pick the first character from the source
- Count subsequent occurrence of the same character in the given input
- If a new character is found append the character and count in the result
- Continue the process until the end of the input
That was a description about how our encoding algorithm will work. But what about decoding the encoded string ?
We will have the compressed form of our original string as :
J16G15@21
Code language: CSS (css)
To reverse the above string we will
- Get the first character of the input
- Get the subsequent number until we find non digit character
- Run loop that number of times and append the character to the result
- Repeat the process until we have completed reading input
Also Read : How to learn programming and where to start ?
TECHENUM
Implementing the algorithm
Now we have finally come to the part of actual action. Let use see how we can implement the algorithm in Java.
Create a Java class with name : RunLengthEncodingExample
and insert the following snippet. Though we can make the methods static I have followed the traditional approach.
Also the code is pretty self explanatory I have tried my best to not confuse the reader. If you get confused please do comment below.
Encoding
Let us compress the data in a lossless way.
private String encode(String input) {
long nano1 = System.nanoTime();
StringBuilder result = new StringBuilder();
int lengthOfInput = input.length();
char lastCharacter = input.charAt(0);
int lastCharacterCount = 1;
// we will go until equal to the length of input and we will do the final flush inside the loop
// in this way we will have roughly 90% - 95% performance improvent over nested iteration
// but this might take a little more mory as compared to the nested loop iteration
for (int index = 1; index <= lengthOfInput; index++) {
if (index == lengthOfInput) {
// we have already completed everything let us append the final value of index - 1 iteration
result.append(lastCharacter).append(lastCharacterCount);
break;
}
char currentCharacter = input.charAt(index);
if (lastCharacter == currentCharacter) {
lastCharacterCount++;
} else {
result.append(lastCharacter).append(lastCharacterCount);
lastCharacter = currentCharacter;
lastCharacterCount = 1;
}
}
long nano2 = System.nanoTime();
long result1 = TimeUnit.NANOSECONDS.toMicros(nano2 - nano1);
System.out.println("encoding total time taken : nano seconds -> " + result1);
return result.toString();
}
Code language: Java (java)
Decoding
Now that we have successfully encoded our string in the section above. Let us work towards bringing back the original form of data.
private String decode(String encoded) {
long nano1 = System.nanoTime();
StringBuilder result = new StringBuilder();
int lengthOfEncodedString = encoded.length();
StringBuilder timesToRepeatLastCharacter = new StringBuilder("");
char lastCharacter = encoded.charAt(0);
for (int index = 1; index <= lengthOfEncodedString; index++) {
if (index == lengthOfEncodedString) {
// we have reached to the end of encoding ; do the final round
// this code looks repeated
for (int i = 0; i < Integer.parseInt(timesToRepeatLastCharacter.toString()); i++) {
result.append(lastCharacter);
}
break;
}
char currentCharacter = encoded.charAt(index);
if (Character.isDigit(currentCharacter)) {
timesToRepeatLastCharacter.append(currentCharacter);
} else {
// try parsing the timesToRepeatLastCharacter and get the number of times the character should be repeated
for (int i = 0; i < Integer.parseInt(timesToRepeatLastCharacter.toString()); i++) {
result.append(lastCharacter);
}
lastCharacter = currentCharacter;
timesToRepeatLastCharacter = new StringBuilder();
}
}
long nano2 = System.nanoTime();
long result1 = TimeUnit.NANOSECONDS.toMicros(nano2 - nano1);
System.out.println("decoding total time taken : nano seconds -> " + result1);
return result.toString();
}
Code language: Java (java)
Also Read : 7 Sites to Learn Programming Languages Online For Free
TECHENUM
Benchmark
I have also included the nanoseconds to microseconds measurement to track the execution time of the method. This will vary depending on the system configuration and system load at the time of execution.
Though this is not the ideal way of bench-marking. I had to do some quick tests before publishing this article.
As we can see from the image it took about 40 microseconds to run the code to encode and about 85 microseconds to decode.
Please take this benchmark with a grain a salt. And it’s always better to perform our own standard test.
Things to watch out for
Even though we have successful encoded and decoded the input back and forth. This will not work with numeric values and might produce negative compression ratio if every character is unique.
But this article was only for demonstrating how one might encode or decode. That one might use for educational purposes ? Regardless, keep learning and keep searching.
If you feel I have misrepresented anything at all feel free to comment below.
Also Read : Consume an API with Retrofit in Android
TECHENUM