Nearly 30% of human proteins have tandem repeating sequences. Structural understanding of the terminal repeats is well-established for many repeat proteins with the common α-helix and β-sheet foldings. By contrast, the sequence-structure interplay of the terminal repeats of the collagen triple-helix remains unexplored to date. As the most abundant human repeat protein and the most prevalent structural component of the extracellular matrix, collagen features a hallmark triple-helix formed by three supercoiled polypeptide chains of long repeating sequences of the Gly-X-Y triplets. Here, with CD characterization of 28 collagen-mimetic peptides (CMPs) featuring various terminal motifs, as well as DSC measurements, crystal structure analysis, and computational simulations, we show that CMPs only differing in terminal repeat may have distinct end structures and stabilities. We reveal that the cross-chain hydrogen bonding mediated by the terminal repeat is key to maintaining the triple-helix’s end structure, and that disruption of it with a single amide to carboxylate substitution can lead to destabilization as drastic as 19 °C. We further demonstrate that the terminal repeat also impacts how strong the CMP strands form hybrid triple-helices with unfolded natural collagen chains in tissue. Our findings provide a fresh spatial profile of hydrogen bonding within the CMP triple-helix, marking a critical guideline for future crystallographic or NMR studies of collagen, and algorithms for predicting triple-helix stability, as well as peptide-based collagen assemblies and materials. This study will also inspire new understandings of the sequence-structure relationship of many other complex structural proteins with repeating sequences.
Supplementary info and methods