Finding
Mona

 

But Mona Lisa must have had the highway blues.
You can tell by the way she smiles.
— Bob Dylan

 

La nostra Bambina giocando Nascondere-e-Cercare! 
 

100x142 = 14200
Finding Mona
Zip format ~8MB.

(Requires VBA6 which is included in Office 2000 and Excel 2000.  
Please save and extract all files to a folder before running software.)
Certified Virus-Free

©2006 Zachriel

Finding Mona
Home

 


The Discussion

This project is based on a particular exchange with 'topmind' on the newsgroups talk.origins

Zachriel: If there was a bit-map of the Mona Lisa within the human genome, there is a very high probability it would have been noticed because it would stick out like a statistical sore thumb.
topmind: I am very skeptical of that. Can you provide a demonstration?

Zachriel: Statistical methods used to analyze the human genome are more than capable of detecting intelligent patterns, including a bitmap of the Mona Lisa. 
topmind: Prove it.

topmind: I asked which prior searches would have found Mona Lisa bitmaps if they existed.
topmind: The algorithms to find such would have to be explicitly tuned.
topmind: the "test" suggested was bitmapped images.
topmind: Put your Mona where your mouth is! 

And so on. 


The Hypothesis

As an image is generally distinguished by having regions of self-similarity, this should result in a localized statistical anomaly in a random sequence, and quite possibly in genomic data, as well. 

Finding Mona will demonstrate that a bitmap image has a very distinctive statistical footprint, and that nothing else within the examined genome has this footprint. It uses an exceedingly simple statistical test, yet is quite powerful. 


The Data

The bitmap of the Mona Lisa was downloaded from the Louvre, then reduced to just 10x14 pixels to provide the smallest discernable image, Tiny Mona .

The E. coli genome was downloaded from the E. coli Genome Project and is composed of about 4 million nucleotide bases. 


The Algorithm

Finding Mona divides a sequence up into a number of equal-length segments, determines the arithmetic mean of each segment, and identifies the segment that varies most from the global mean. The sequence can be a genome or random data. As a test, we can choose to randomly insert Tiny Mona into the sequence. 

Finding Mona consistently identifies Tiny Mona as an anomaly, demonstrating that a conventional bitmap of the Mona Lisa has a very distinctive statistical footprint, and that nothing else within the examined genome has this footprint. Finding Mona uses an exceedingly simple statistical test, yet is quite powerful. This result has been verified for a variety of different values for Base/Pixel and Base Assignment parameters. 

ctrl-g to compute (go)


The Parameters

The algorithm runs rather slow, so the computation is divided into three parts. You can limit the computation by avoiding the selection of certain parameters. 

HASH: Takes the genome and converts it into numbers. Depends on the Base/Pixel and Base Assignment settings. The numerical encoding of the genome is done by treating each nucleotide base as a quaternary digit (base 4). You can set the number of digits per pixel from 2 to 4, but it defaults to three — the number of bases in a natural codon. You can also set the assignment of the bases to each quaternary numeral. Can take several minutes.

BUILD: Reads the Hash into arrays. Depends on the Sequence size parameter, as well as the Mona and Genome flags.

CALC: Sums the segments. Depends on the number of Segments and the Length of the segments.

BASES a, c, g, t : Each base is a quaternary digit (numerical base-4), so takes two binary bits. Each base can be assigned a value from 0 to 3. Be be sure to set them uniquely. 

BASE/PIXEL: The number of bases per color-pixel, from 2 to 4. Each color-pixel is a single byte in memory. If Base/Pixel is set to less than four, the quaternary digits are set to the higher order binary bits of each byte.

SEQUENCE: The size of the Sequence to be considered.

MONA: If true, then Tiny Mona will be hidden randomly within the sequence.

GENOME: If true, then the sequence will be the E. coli genome. If false, then the sequence will be filled with the appropriate random numbers.

SEGMENTS: The number of Segments (max 32000). 

LENGTH: The Length of each segment.

REMEMBER
If you change the light-yellow settings, it requires a simple Calc.
If you change the bright-yellow settings, it requires a Build.
But if you change the orange settings, beware, it requires a Hash.

NOTE: When Finding Mona first runs, it has to Build the arrays, but already has a default Hash table. After that, you can Calc with the Segment length and number without having to Build or Hash

ctrl-g to compute (go)

 


The Results

The algorithm determines the segment which varies the most from the global mean. It then decides whether it overlaps the randomly placed Tiny Mona . If it does, it notes it as a match

The individual segment averages are listed in column-G. The segment averages in the vicinity of a successful match are in column-H & I


The Conclusion

There are an infinite number of ways of encoding an image. There are an infinite number of algorithms for detecting an image. There are an infinite number of possible false positives, depending on the algorithm. However, we can say with some certainty, that standard statistical methods would easily discover a conventional bitmap of the Mona Lisa, and that no such image exists within the genome examined. 

 


 

Zachriel's Blog
Civilization, Mutagenation and more!
http://zachriel.blogspot.com/
002514