On the Synthesis of DNA Error Correcting Codes
Submitted to Biosystems
Daniel Ashlock,
with Sheridan K. Houghten, Joseph Alexander Brown, John Orth

An error correcting code is a collection of strings over a given alphabet that are well separated from one another. The separation property means that small numbers of errors in transmission of a code word can be both detected (by noting that the word received is not a code word) and corrected (by assuming the code word transmitted is the one most similar to the one that was received). In this study we are creating a code where the transmission channel consists of incorporating the code word, in the form of an oligonucleotide, as a label in a genetic construct and later reading it out when the entire construct is sequenced. In this situation the Hamming metric [15], which only counts substitutions, is inappropriate because sequencers can skip a base or read one that is not there. This means that codes relative to the edit metric or Levenshtein distance[17] are required for the detection and correction of sequencing errors.