Let’s write a binary search algorithm which is correct. This is done using Invariants.
Consider a sorted array arr
of size n
. We want to find if the number k
is present in arr
or not using binary search. This algorithm will be written with the aid of invariants so that it can be proven correct too.
def find(arr, n, k):
begin = 0
end = n - 1
# INV: arr[0..begin-1] and arr[end+1..n-1] doesn't contain k
while begin <= end:
mid = (begin + end) / 2
if arr[mid] == k:
return true
else if arr[mid] < k:
# arr[mid] < k => arr[begin..mid] < k as arr is sorted
# => arr[0..mid] doesn't contain k
begin = mid + 1
else:
# arr[mid] > k => arr[mid..end] > k as arr is sorted
# => arr[mid..n-1] doesn't contain k
end = mid - 1
return false
Following the comments along with the code, we can easily see that the invariant holds. We also have to show that the loop terminates, and also when it terminates we’ll use the invariant to show the algorithm terminates correctly.
As per the definition of mid
, we have begin <= mid <= end
. Let the new begin
and end
after an iteration be begin'
and end'
. Then we have:
end' - begin' = (mid - 1) - begin <= end - 1 - begin < end - begin
Or, end' - begin' = end - (mid + 1) <= end - begin - 1 < end - begin
Also, we can see that
end' - begin' = (mid - 1) - begin >= (begin - 1) - begin >= -1
Or: end' - begin' = end - (mid + 1) >= -1
=> end' - begin' >= -1
So, the difference between end
and begin
decreases in every iteration. Given that begin <= end
i.e. end - begin >= 0
in the beginning, eventually we’ll hit the point where end - begin
will become negative and the loop will terminate.
By the second deduction we can see the difference will be atleast -1, so we know that end - begin = -1
when the loop terminates. Combining that with the invariant, we can say that arr[0..end] and arr[end..n-1] doesn't contain k
. This means k
is not present in the entire array and returning false is correct.