Using computational techniques to analyze more than 47,000 different characters from 133 living and extinct scripts, co-authors Helena Miton of the Santa Fe Institute and Oliver Morin of the Max Planck Institute for the Science of Human History, addressed several questions around why and how the characters of different writing systems vary in how complex they appear.
“When we started this project, we wanted to test whether you find a general simplification of characters over time,” Miton says. “Do scripts simplify their characters as they spend more time exposed to evolutionary pressures from the humans who are learning them and using them?”
We interact with most types of writing through our visual system, so the characters and scripts that make up the hundreds of writing systems humans have used through history are limited to, and optimized for, the way our brains process visual information. Part of that optimization, write the authors, is the graphic complexity of the characters in a script.
Morin illustrates this in a Twitter thread, offering an image of two characters, one apparently more complex, with more detail and contours, than the other. He writes, “Why care about this? Because your brain does. Simpler letters are easier and faster to process.” He goes on, “Any small improvements in processing speed can accumulate into big-time gains for readers. Letters are under pressure to simplify, but also have to carry information.”
A highly cited study from 2005 suggests that writing systems tend to settle on a common solution to these pressures: using about three strokes per character. In this new paper, Miton and Morin push back against that finding, and others, by studying a larger and broader set of scripts and incorporating new methods that account for cultural evolution and lineages in writing.
Miton and Morin used two measures of graphic complexity to compare characters and scripts from the massive dataset drawn from geographic locations around the world. The first measure, “perimetric” complexity, is a ratio of inked surface to its perimeter. The other measure, “algorithmic,” is the number of bytes needed to store a compressed image of a character.
Among their results, they found that large scripts — those with more than 200 characters — had, on average, more complex characters than scripts with a smaller number of characters. Relatedly, the study suggests that the main driver of characters’ complexity was which linguistic units (e.g., phoneme, syllable, entire word, etc.) the characters encode.
They were surprised to find little evidence for evolutionary change in complexity: scripts that were invented in the past 200 years used characters of similar complexity to those that have been around for longer. In forthcoming work led by Piers Kelly, Miton and Morin investigate whether written characters follow an optimization process that happens more quickly than was captured in the current study’s dataset.