Sorting is the process of putting the elements in an array or a list in order using an algorithm. A sorting algorithm may be more efficient than the other depending on its performance. This is very useful when encountering different real life problems. For example, if we want to calculate the median of a list of numbers, the list will have to be sorted and the median will be found in the middle of it. If we want to solve this problem faster, we will have to choose between many sorting algorithms (which behave differently depending on their input).
In computer science we use the Big-Oh notation to represent the efficiency of an algorithm over a large input of size n. This notation analyzes the performance of the algorithms as n changes. Depending on the function, it can be constant, logarithmic, linear, quadratic, exponential or a combination of any of these. The time it takes will depend on the efficiency of the algorithm. Some commonly seen functions and their relative performances are:
O(1) <= O(lgn) <= O(n) <= O(n^2) <= O(n^3) <= O(2^n) <= O(n^n)
There are several sorting algorithms that are O(nlogn) like quicksort, merge sort, timsort, etc and the worst ones, which are approximately O(n^2) are insertion sort, selection sort and bubble sort.
Two of these algorithms are described below:
– Quicksort:
Here, if the length of the list is greater than 1, the first element of the list is taken as the pivot. Using this pivot, the function breaks the list into two lists. The first list contains the elements that are less than the pivot (let’s call this list “less_than_pivot”) and the other list contains the elements that are greater than the pivot (called “more_than_pivot”). After that, recursion is used in those two lists. Smaller lists will be broken in two and so on until their lengths reach 0 (this approach of breaking down the problem into two recursively is also called divide and conquer). Finally, less_than_pivot, the pivot itself and more_than_pivot will be merged and this final sorted list is returned.
The problem with this approach is that in Python, there is a recursion limit. So, if a really long list is used (e.g.: a list with 5000000 elements), this function will not work as the recursion limit is exceeded.
To prevent this, an iterative approach (using a while loop) can be used. This is called memoization. Memoization, apart of avoiding the recursion limit problem, optimizes the function because it avoids repeating calculations done previously.
The worst case of this sorting algorithm is when the list is already sorted. In this case, its efficiency is of O(n^2). On the other hand, for the rest of the cases, its efficiency is of O(nlogn) as the main list and each sub list are divided in two and the pivots are compared with every element in them.
– Merge Sort:
Unlike quicksort, merge sort splits the input in half continuously until each sublist’s size is 1. Then, the sublists have to be merged (comparing the elements and creating ordered merged lists) until there is only 1 sublist remaining. This remaining list will be the sorted list and is the one that will be returned by the function. This algorithm’s performance is O(nlogn) in all of the cases because in every case, the input will have to be split into two. The sorting and merging process is done after splitting the list.
Here you can see a visualization of 15 different sorting algorithms: