Header-based Fixed-length Block Encoding with Self-describing Protocol
About Glycan.rna.mk
This tool translates complex branched glycan sequences into dense, orthogonal DNA barcodes, and decodes them back with error-correction.
Designed for DNA Encoded Glycan Libraries (DEG), it supports the standard IUPAC-condensed nomenclature and offers multiple optimization modes to respect biological synthesis constraints (GC limits, palindrome avoidance).
🧬 DEGL Encoding Control Panel
Examples:Sialyl Lewis XLewis YN-Glycan CoreLacNAcComplex N-Glycan
📋 IUPAC Format Support: Glycan monomers, linkages, modifications and brackets are separated by " : ". Supports both custom ":" format and standard IUPAC-condensed format.
Modular N-Glycan Builder
Search External Databases
Search by Common Name (via PubChem) or GlyTouCan Accession ID (e.g., G23535HQ).
Result:
Input Glycan:
DNA Code:
Header Tag:
Total Length:
Mode Used:
IUPAC Format:
Decode DNA to Glycan (Auto-detect Header)
Automatically detects encoding mode from Header tag. Supports Max Density Mode sequences without Header.
Enter Nucleotide:
Examples (Legacy): AAGAATAATAGATAAAGALewis Y (Legacy)
Result:
Input DNA:
Detected Mode:
Decoded Glycan:
IUPAC Format:
SNFG Structure:
Current Codebook Reference
View the complete mapping of DNA barcodes to Glycan IUPAC codes for the currently selected encoding mode.
Note: Changing the token length or mode will dynamically regenerate this table.
DNA Barcode
Glycan Token / IUPAC Name
Category
Methodology: DEGL Encoding Modes
Table 1: Characteristics and constraints of the encoding modes available in the DEGL system.
Encoding Mode
Token Length
Theoretical Capacity
Error Correction Capability
GC Constraints
Palindrome Check
Max Density (Legacy)
Variable
Dependent on dictionary
Heuristic matching (d=N/A)
None
Off
Optimized (Ultra-compact)
4 nt
~256 tokens
None (d=1)
0–100% (No limits)
Off
Optimized (Balanced / Default)
5 nt
~996 tokens
None (d=1)
0–100% (No limits)
Off
Optimized (Thermodynamic)
6 nt
>500 tokens
Error detection only (d=2)
40–60%
On
Optimized (High Fidelity)
7 nt
~350 tokens
1-bp active correction (d=3)
40–60%
On
Table 1 Notes:
Token Length: The number of nucleotides (nt) assigned to each glycan building block or macro-compressed core.
Error Correction Capability: The minimum Hamming distance (d) enforced across the codebook. A distance of d=2 allows for the detection of sequencing errors, while d=3 enables the active mathematical correction of a 1-bp mutation.
GC Content & Palindrome: Strict thermal stability controls required for optimal PCR amplification and the prevention of polymerase slippage during library construction.