Thursday, March 12, 2015

Split letters and their position into two files for simple "double strand" obfuscation

To read the original description of a mechanism that achieves some degree of message security without encryption, please read "Double Strand" text security available in "CuttleFish".

Here are two basic diagrams to illustrate the fundamental principle of the double strand concept.

Below is a visual illustration of the algorithm. The steps "original", "alphabetically sorted", "randomized", and "split" create the message distributed into File 1 and File 2.

To reconstitute the message, only two steps are needed: "realign" and "sorted on number strand, ascending".

More considerations:
1. Add random characters to the original to make cracking the message harder.
As implemented at present, CuttleFish adds "trash" to the original message before it is sorted alphabetically.
2. Further shuffling of the randomized but still linked strands can be added in the source code (also on the site) without requiring an upgrade by other users.
3. The same is true for the step "alphabetical sort". Currently a plain ascending sort, there is nothing that prevents you from changing this in the source code. For example, you could do a descending sort, create chunks - again sorted in various ways - or skip the step altogether.
4. "Stuffing" output File 1 or File 2 into another CuttleFish envelope to obscure file contents. If used within a small group of participants, this would be trivial to do and add more security.

Advanced considerations:
1. Modify the sources for use on binary files or executables.
2. Figure out how to to this on larger streams.
3. Write an OS level service for on-the-fly operation, with one part of the output on a USB stick.
4. Do some creative RNG switching, maybe write a "noise" module to mess just a little with RNG output.
5. [Update 3/13] To make text even more resilient against cracking, added "random" characters would ideally follow the statistical distribution of character frequency for the given language. A frequency table of the most common n-grams would be cool.  If you feel like being extra creative, you could add "trash" frequency tables from a second language. [/Update]

[Update 3/13/2015]
Yes, the required coding skill level is "CS 101", that's part of the charm.
The fundamental idea of a map with numbers and two sorts, one alphabetical and one numeric, date back to the early Silicon Age when the author was faced with a text content question.

The problem: Find duplicate pieces of text in several million words of technical documentation.
The solution: Break the docs into sentences (headline counts as a sentence, list items do too), create a table out of the text, add a column for sequential numbers, sort on the text.
Voila, all duplicates next to each other.

This was at a time when the amount of RAM was measured in the low megabytes, as in wow, a machine with 4 megabytes of RAM.
[/Update]

No comments:

Post a Comment