PROBLEM : Group anagrams together from a given list of strings.
UNDERSTANDING : When we say group anagrams together we would have to understand what an anagram is. Two words are said to be anagram when one consists of the same letters other is made up of. For example, below list are said to be anagrams of each other.
So in this problem we would have to group the anagrams in the exact format as mentioned above when we have a list say [ 'listen', 'eat' , 'bat' , 'ate', 'silent', 'tab', 'tea' ]
APPROACH : First, let's understand how we will solve this problem. Since we know that anagrams are the words which have same letters with each other and for same amount of time. we can approach the solution of the given problem using sorting.
Sorting any of the anagram alphabetically would result in the same word. Let's understand this further. If we have a list of anagrams ['ate', 'eat', 'tea'], sorting this alphabetically we would have
We can see that the sorting anagrams alphabetically would give us same word. We will use this aspect to get to our solution.
We can hold the result of the sorting as the key and the words as the anagram. We can use any dictionary or map as the data structure based on the choice of one's programming language.
So for our list above, we would have
KEY | VALUE |
aet | [ 'ate', 'eat', 'tea' ] |
eilnst | [ 'silent', 'listen' ] |
abt | [ 'tab', 'bat' ] |
This would give us a dataset with anagrams mapped together.
IMPLEMENTATION
We would leverage JAVA as the programming language here for solving this problem at hand. Now even in JAVA there are different ways we can solve this. Either we can solve this using traditional approach [pre JAVA 8] or using streams [JAVA 8]. We will go through each of the approach here.
Traditional Approach:
Map<String, List<String>> anagramMap = new HashMap<String, List<String>>();
for(String data : dataset) {
//Convert the string into character array
char[] dataCharArr = data.toCharArray();
//Sort the character array using Arrays sort
Arrays.sort(dataCharArr);
//Sorted Key
String key = String.valueOf(dataCharArr);
//Check if anagram map has this key
// and if yes return the container or else create new list
List<String> dataList = anagramMap.get(key);
//The above operation will return null if it does not contain the key
//If the list is null we will create new list
if(dataList==null) {
dataList = new ArrayList<String>();
}
//Now we will add to the data list
//and append the list to the map against the key
dataList.add(data);
anagramMap.put(key, dataList);
/**
* The above multiple steps can also be done with just one line
*
* anagramMap.computeIfAbsent(key, list-> new ArrayList<String>()).add(data);
*
* What this will do is if there is no list against the key
* it will create the list
* append the data
* map it against the key
*
* This is JAVA 8 though!!
*/
}
//Printing the values
System.out.println(anagramMap.values());
Using Streams:
Arrays.stream(data)
.collect(Collectors.groupingBy(str -> {
char[] arr = str.toCharArray();
Arrays.sort(arr);
return new String(arr);
},Collectors.toList()))
.values()
.forEach(str -> System.out.println(str));
Let's break down the code and understand what happens with each line
Create a stream from array
Arrays.stream(data)
Group the data, but how that will be mentioned in the data
.collect(Collectors.groupingBy(str -> {
char[] arr = str.toCharArray();
Arrays.sort(arr);
return new String(arr);
},Collectors.toList()))
We will get only the values which are the grouped list and print them.
.values()
.forEach(str -> System.out.println(str));
COMPLEXITY
We would also like to about the complexity right? So let's say we have m strings in a list, we also sort each string right? So we have to understand the sorting complexity.
Arrays.sort as of JAVA 8 kind of uses two sorting algorithms one which is kind of modification of Quick Sort called Dual-Pivot Quick Sort and another which is a adaptation of Merge Sort called as Tim Sort. Time complexity for both of them is O(n log n) where n is total items in the array.
So from the grouping anagram solution perspective, the time complexity would be
O (m * n log n)
where m = no. of strings in the list
n = no of characters in each string [used in sorting]
Space Complexity would be O (m * n) since we would need storage for m words and n letters in each words.
That's about the grouping anagrams problem and the solution for the same.
We will be back soon with another problem to work on!!
Powered by Froala Editor