Sunday, January 10, 2016

Maths Puzzle: The self descriptive number

I came across a maths puzzle in a youtube video presented by a person named James, who talks about numbers.
It goes like this: "I have a ten digit number. The first digit tells me how many zeros are in the number. The second digit tells me how many ones are in the number. The third digit tells me how many twos are in the number, and so on until the tenth digit which tells me how many nines are in the number. What is my number?"
(In this article and respective source code the first digit will also be the left-most digit or also digit with index zero.
Individual digits will be referred to as x[i], where i is the index, e.g. x[0] is the first and left-most digit.)

One may solve this by just scribbling on a sheet of paper. A sharp mind could probably manage even without aid. BUT can a computer do it? Or in other words, is there a systematic way to retrieve the solution so you don't go in circles, in case you go past it? Can we also solve for numbers of different lengths? Can we find all solutions if there's more than one? Why would we want to do all of this? For all good reasons!
You can find the python code I used to test my solutions on GitHub https://github.com/zaraken/maths-puzzle-self-desc-num

Brute force won't take us far but might help us with some useful observations. The following is the result of trying to solve the puzzle with that method.

  1. def increment(number, base):
  2.     for i in range(len(number)-1,-1,-1):
  3.         if number[i] == base-1:
  4.             number[i] = 0
  5.         else:
  6.             number[i] = number[i]+1
  7.             return True
  8.     return False # overflow
  9. def check(number, base):
  10.     if sum(number) > base:
  11.         return False
  12.     for i in range(base):
  13.         if number[i] != number.count(i):
  14.             return False
  15.     return True
  16.    
  17. def test(base):
  18.     number = [0 for i in range(base)]
  19.     while increment(number, base):
  20.         if check(number, base):
  21.             print number
  22.             # break if you only want one solution
  23.            # continue loop to find all solutions
Below we can see the solutions and and execution times for number-size of up to 9. Above that it becomes unreasonable to use the brute force method and with that we haven't even solved the initial puzzle for ten digits.

Brute-force
base= 3                                          
       time ->  0.000236065809332
base= 4 
       num= [1, 2, 1, 0]
       num= [2, 0, 2, 0]
       time ->  0.00142494796502
base= 5
       num= [2, 1, 2, 0, 0]
       time ->  0.00402423780038
base= 6
       time ->  0.0397031044792
base= 7
       num= [3, 2, 1, 1, 0, 0, 0]
       time ->  0.689402398548
base= 8
       num= [4, 2, 1, 0, 1, 0, 0, 0]
       time ->  14.0752002176
base= 9
       num= [5, 2, 1, 0, 0, 1, 0, 0, 0]

You may already see a pattern there and you could go about deriving/proving the solution mathematically, but the initial programming solution is so poor that we have to try and do something about it. Otherwise we've learnt nothing ...

Let us consider a few things.
1. You've probably already noticed in the results above that we refer to the different cases as bases. That is because in a 10-digit number, for example, each digit refers to one of the digits 0 to 9, which make up the decimal system - a system with 10 digits or base 10.  And for the sake of the puzzle in a 10-digit number it only makes sense to work with the decimal system, in an 8-digit number - octal, etc. We are not going to work with digit-values that are above (or equal to) the base.
2. The first digit cannot be equal to zero, because if it is, then there is at least one zero in the number and then the first digit must be at least one. Which also means that there is always at least one zero in the number that is not at the first position.
3. Intuitively we could guess that digits to the left would probably have higher values than those to the right. Why? Well, if the 9th digit is anything else than a zero, then there would be at least one other digit that would be 9 and then there would have to be nine digits of that and so on and so forth. Lets write it down a bit more formally and see what comes out of it.
In an n-digit number, if the last digit, x[n-1] = 1  then  must x[1] = n-1, from which follows that there must be at least (n-1) ones. So far we have established that at least one digit always has the value zero, x[1]=n-1, and further n-1 digits must have the value 1. But that is already one digit more than what we have. Therefore the last digit cannot be anything else than a zero no matter the length of the number. (For completeness this proof must be further generalized for x[n-1]>=1) Consequently, no digit can have the value n-1, because it would break the aforementioned rule.
4. Each digit in the number refers to some number of digits in that number. Furthermore they are not overlapping, meaning that the digits that are referred to by some digit are not referred to by any other digit. From this we come with the invariant that the sum of all digits in such a self descriptive number must always be equal to the length of the number. sum(x[0]...x[n])=n.

With this information we can update the algorithm for finding a solution. First we can fix the last digit of the number to the value of zero or omit it altogether to save on memory. We don't have to perform validity checks on numbers that don't satisfy the sum invariant..
We can do even better. We could only iterate over or generate numbers that already satisfy the sum invariant. Lets view the sum as a value budget. We can distribute the value between the different digits in the number. Starting from the first digit provided with the whole budget it could iterate over different possible values and then recursively pass the remainder to the next digit to use as it pleases.
Version 2
  1. def increment(number, digit, budget, base, check_all):
  2.     if budget == 0:
  3.         if check(number, base):
  4.             print '     '' num=', number + [0]
  5.             return True
  6.     elif digit == base-2# last digit, must consume whole budget
  7.         number[digit] = budget # because of sum invariant
  8.         if check(number, base):
  9.             print '     '' num=', number + [0]
  10.             return True
  11.     else:
  12.         for val in range(budget,-1,-1) if not digit==0 else range (budget-2,0,-1):
  13.             number[digit] = val
  14.             if increment(list(number), digit+1, budget-val, base, check_all) and not check_all:
  15.                 return True
  16.     return False
  17. def check(number, base):
  18.     if sum(number) > base:
  19.         return False
  20.     for i in range(base-1):
  21.         cnt = number.count(i)
  22.         if (i == 0): cnt+=1 # +1 for last zero
  23.         if number[i] != cnt:
  24.             return False
  25.     return True
  26. def test(base):
  27.     number = [0 for i in range(base-1)]
  28.     increment(number, 0, base, base, True)

Find Any Solution
base= 8
       num= [4, 2, 1, 0, 1, 0, 0, 0]
       time ->  0.00231318833818
base= 20
       num= [16, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
       time ->  0.0061573831934
base= 100
       num= [96, 2, 1, 0, ... 0, 1, 0, 0, 0]
       time ->  0.694509032297

Exhaustive Check For Solutions
base= 8
       num= [4, 2, 1, 0, 1, 0, 0, 0]
       time ->  0.00517847986448
base= 15
       num= [11, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
       time ->  36.2679355489

As we can see in the result of the program this introduces a significant boost when trying to find a solution. But what if we want to check for all possible solutions. We are still doing a lot of unnecessary checks. Lets consider what happens when we change a value. Each change that we make to a digit affects the values of two other digits in a very precise manner, namely one digit is increased by 1 and another digit is decreased by 1. So could maybe help to keep track of the count of each digit with each change so that we can further limit the possible solutions, at the expense of some extra memory of course?
Inspecting several numbers may bring something interesting to your attention.
According to the problem statement  lets count the number of same valued digits and record that in the appropriate position in a new derived number, which we will call a reference count. We find that the same reference count corresponds to several different numbers, basically permutations of the same digits. The reference count of an actual solution yields the same number. So there's an idea for an optimization. Let us do a reference count for only one permutation and check if that derived number is a valid solution.

_______base 5_______
candidate       reference
solution         count
22100      ->   21200
22010      ->   21200
22001      ->   21200
21200      ->   21200

Version 3
  1. def increment(number, digit, budget, base, check_all):
  2.     if budget == 0:
  3.         cnt_num = getCount(number,base)
  4.         if check(cnt_num, base):
  5.             print '     '' num=', cnt_num + [0]
  6.             return True
  7.     elif digit == base-2# last digit, must consume whole budget
  8.         number[digit] = budget # because of sum invariant
  9.         cnt_num = getCount(number,base)
  10.         if check(cnt_num, base):
  11.             print '     '' num=', cnt_num + [0]
  12.             return True
  13.     else:
  14.         for val in range(min(budget,number[digit-1]),0,-1) if not digit==0 else range (budget-2,0,-1):
  15.             number[digit] = val
  16.             if increment(list(number), digit+1, budget-val, base, check_all) and not check_all:
  17.                 return True
  18.     return False
  19. def getCount(number,base):
  20.     cnt_num = list()
  21.     for i in range(base-1):
  22.         cnt = number.count(i)
  23.         if (i == 0): cnt+=1 # +1 for last zero
  24.         cnt_num.append(cnt)
  25.     return cnt_num

Find Any Solution
base= 8
       num= [4, 2, 1, 0, 1, 0, 0, 0]
       time ->  0.00235766450515
base= 100
       num= [96, 2, 1, 0, ... 0, 1, 0
       time ->  0.0111079227021
base= 1000
       num= [996, 2, 1, 0, ... 0, 1, 0, 0, 0]
       time ->  0.237362460661


Exhaustive Check For Solutions 
base= 8
       num= [4, 2, 1, 0, 1, 0, 0, 0]
       time ->  0.00251333108957
base= 15
       num= [11, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
       time ->  0.0036611583219
base= 50
       num= [46, 2, 1, 0, ... 1, 0, 0, 0]
       time ->  10.5631863068

Conclusion 
Did we find a systematic way of retrieving a solution to the above mentioned problem? We certainly did for a computer. How about for a human? Well a human could do something similar. I imagine it like a big volume of water on the left side of the number being gradually distributed to the right :P

No comments:

Post a Comment