The Code Talks - Binary Search - Search Algorithm Explained

Binary Search - Search Algorithm Explained

What if someone is asked to search for a name in a dictionary or a phone number for a person from a phonebook?? What would one do? Let's say one if the word starts from letter K. So one would open the book see which letter the page is on. If the starting letter is say P which is after K we would shift the search is the left side or if it is say D which is before K we would shift the search onto the right side.

If one is able to understand this, then one would have understood exactly how binary search works.

Binary search is one of the most useful search algorithm which is used on a sorted dataset. It significantly reduces the time that is required to search a sorted dataset as compared to linear search where we go through each item and check. Taking the above example of a dictionary or phone book, starting from A until we get our target.

So let's look at how the Algorithm for binary search would look for a given sorted dataset.

Set the first(left) and last(right) boundary of search as the first and last element position. So left=0, right=last index-1. For example for a dataset of 4 elements. Initially left=0, right=3;
Start from the middle of the dataset, Check if the target matches the mid
If the target matches, return the index/position.
If not check whether the target is lower than the middle element, if yes set the right boundary to the middle position + 1 (because we have already validated the middle element) . right=mid-1 .If the target is greater than the middle element set the left boundary to the middle position. light=mid+1
Repeat the above step until we have a match.
If we do not get a match even after traversing the whole dataset return that no data found.

Let's look at how the flow would go by taking an example.

We have a sorted input array like [2, 5, 10. 15, 44, 87, 92, 101, 474] and we have target as 92 i.e. the last index. Let's see how this would go.
We set the left as 0, right as dataset size - 1 which is 9-1 i.e. 8 here.
The middle position here is 4 which has a value 44. This does not match our target.
Let's see whether our target is greater than or less than the middle value. Since 474 is greater than 44, as mentioned in the algorithm. left would be set as 4+1 i.e. 5 and right would remain as 8. Our search boundary would now reduce to values between 5th and 8th index
Again we would get the middle index which would be 6, which has value 92 which does match the target wo we return the index of the position i.e. 6.

Well, it is simple ain't it? Now there is a small thing which we might have to understand before moving on to how the code would look like in say JAVA

Now that small thing is about Finding The Mid.

Generally how one would find the mid would be

mid = (right-left)/2

while this might seem perfect from human standpoint, using only this would lead to certain overflow errors.

Say in above example for first time, (8-0)/2 would return 4 which is perfect. However when we go for search the 2nd time after narrowing the boundaries we would have left=5 and right as 8 so now the middle value as per above formulae would be (8-5)/2 which would give 3/2 i.e. 1.5 however it would not make sense to search for value at index 1.5 when the positions to search is between 5 and 8.

The solution to this is adding the resultant value to the left index which would be 5+1.5 so either 6 or 7 depending upon the autoboxing. Hence the formulae to find the mid would be

mid = left + (right-left)/2

Since we have now solved the small thing, let's focus on how the code would look like.

There are two ways to approach this.

Iterative approach
Recursive approach

Iterative approach

int search(int[] data, int target) {

int left=0, right = data.length-1;

while(left <= right){

int mid = data[left] + ( data[right] - data[left] ) / 2;

if(data[mid]==target)

return mid;

//set search boundary to right side if middle element is less than target

if(data[mid] < target)

left = mid + 1;

//search to be done from left side if middle element is greater than target

if(data[mid] > target)

right = mid - 1;

}

return -1;

}

Recursive Approach

Here instead of initializing left and right in the method itself, we would send the

int search(int[] data, int target, int left, int right) {

while(left <= right){

int mid = data[left] + ( data[right] - data[left] ) / 2;

if(data[mid]==target)

return mid;

//search to be done from right side if middle element is less the target

if(data[mid] < target)

return search(data, target, mid+1, right);

//to search from left side if middle element is greater than target

if(data[mid] > target)

return search(data, target, left, mid-1);

}

return -1;

}

We can see that the irrespective of the iterative or recursive approach one uses this is so much faster than the linear search approach where one goes by item one by one. From the complexity of binary search the best case would be O(1) where we find the element at the first ask itself. The worst case would be probably we would have to query till the end. This would essentially mean the worst case and average case complexity leading to O(log n)

From space complexity aspect, the iterative approach would take O(1) while the recursive approach would be O(log n)

Application of binary search can be found in database searches, machine learning and complex applications.

The disadvantage of this algorithm would be that this algorithm ask the data set should be sorted i.e. it would not work on a unsorted data set.

Binary search has been found very useful when searching in large datasets and finds it's use cases even in modern problems. Well that's about the binary search. Until next time before we explore some other interesting topic for reading.