Nik Kantar

Tuesday, December 2, 2014

Sorting with Randomization

Sorting with pure chance, with some elementary stats.

I found a fascinating post last week on /r/ProgrammerHumor: optimize later.

Here's the original image:

Original image

Yes, this piece of code sorts a specific string by randomizing it infinitely and comparing it to the desired result. Once the desired result has been achieved, it prints the string and stops execution.

Without even getting into why one would need to sort a known string and compare it to a known result before returning, I want to look into the (in)efficiency of the code in question.

Reproducing the "Solution" to the Problem

Let's start with some Python code that operates more/less the same way:

#!/usr/bin/python
import sys, random

string_source = 'typewriter'
string_target = 'eeiprrttwy'

for i in xrange(0, sys.maxint):

    string_list = list(string_source)
    random.shuffle(string_list)
    string_result = ''.join(string_list)

    if (string_result == string_target):
        break

    if i == sys.maxint - 1:
        i = 0

Since simply rewriting the code in Python isn't particularly interesting, we should run it a few times and keep track of some numbers:

#!/usr/bin/python
import sys, random

string_source = 'typewriter'
string_target = 'eeiprrttwy'
runs = []

for run in xrange(0, 100):

    for i in xrange(0, sys.maxint):

        string_list = list(string_source)
        random.shuffle(string_list)
        string_result = ''.join(string_list)

        if (string_result == string_target):
            runs.append(i)
            break

        if i == sys.maxint - 1:
            i = 0

print runs

The Long-Awaited Results

The raw result is a 100-element list full of hilariously large numbers:

[269549, 137496, 240377, 484345, 673976, 1613987, 92846, 352143, 1136790, 2108562, 876595, 642806, 323282, 1513, 109430, 591031, 515341, 49572, 1079214, 1718, 336642, 85076, 187450, 34702, 1220284, 684380, 1053765, 1793595, 300108, 1459472, 636922, 276027, 201474, 706707, 492543, 1462999, 425504, 330468, 539478, 563176, 623868, 245793, 7367, 23726, 623156, 353938, 6188, 33855, 1148584, 7989, 840779, 52267, 236986, 429023, 672752, 986512, 345828, 987815, 987144, 26443, 1232278, 2447360, 276129, 319292, 1180522, 99683, 627885, 348592, 828685, 181212, 113842, 194562, 1433020, 352353, 1437230, 60914, 712325, 286642, 348062, 516207, 539552, 154839, 65763, 17699, 36090, 911231, 26066, 1898790, 407310, 625485, 77086, 453708, 1246709, 154345, 844968, 395168, 181864, 13757, 566216, 282737]

After 100 runs in 4 minutes, 47 seconds, we have a high of 2,447,360, low of 1,513, mean of 546,275, median of 374,553, and average run time of 2.87 seconds . These are numbers of shuffles necessary to sort via randomization.

If it's still not obvious why this is an absolutely appalling way to sort strings, consider this far simpler solution (again in Python):

#!/usr/bin/python

string_source = 'typewriter'
string_result = ''.join(sorted(string_source))

print string_result

This gives us 100 runs in 0.019 seconds, for an average run time of 0.00019 seconds, which is roughly 15,105.26 times faster.

In Conclusion

When you're sorting things, make sure you're, uhh, actually sorting them...


The presumed author of the code is Beck3570 and I'm not 100% sure if the whole thing (along with their previous posts) is a joke or not.


Image saved from Imgur.