Earth’s History Through Tree Leaves

By Shelley Littin,  CyVerse

Researchers analyzed images that are now archived in the Cleared Leaf Image Database, a Powered by CyVerse project, with a computer vision algorithm to address one of the most challenging problems in botany and paleobotany research: understanding the venation characteristics of angiosperm leaves.

CyVerse supported the Cleared Leaf Image Database in 2013 as the project set out to create an online amalgamation of images of cleared leaf specimens from research groups around the world.

Today, the database uses CyVerse data services to facilitate discoveries in the study of leaf structure and function, preserve and archive cleared leaf images in an accessible electronic format, and promote the exchange of new data and ideas among plant biologists.

Cleared leaves are leaf specimens that have been chemically bleached, then stained to reveal characteristics of the leaf venation networks. Veins in living leaves transport water to the leaf cells where photosynthesis takes place. Leaves from different species and higher-level groups of angiosperms develop systematically different characteristics in their leaf venation networks.

A Computer Can See the Past

Now, researchers including partners at Pennsylvania State University, Brown University, Harbin Institute of Technology, Microsoft, l’Université Paris-Sud, and the Smithsonian Institution National Museum of Natural History, have published a study in Proceedings of the National Academy of Sciences describing how machine learning may be used to analyze large numbers of cleared leaf specimens and reveal novel features, which in turn could help to classify previously unknown variables in the study of leaf venation networks.

According to the authors: “The botanical value of angiosperm leaf shape and venation (“leaf architecture”) is well known, but the astounding complexity and variation of leaves have thwarted efforts to access this underused resource.”

The authors demonstrate with their study that a computer vision algorithm analyzing several thousand images of diverse cleared leaves was able to successfully learn architectural features of the leaves.

Then, when analyzing a new leaf specimen, the algorithm categorized them into natural groups above the species level. In addition, the system produces visually intuitive heat maps that display previously unidentified characteristics for the researchers.

Understanding leaf architecture is especially important for the field of paleobotany, the authors note, because the fossils of angiosperm leaves typically are found as isolated, unidentified leaves.

A thorough scientific understanding of leaf venation characters could enable paleobotanists to infer large quantities of information by analyzing the preserved venation networks of fossilized leaves, thus giving scientists a clearer view of Earth’s past vegetation, climate, and ecological structure.

“With assistance from computer vision,” the authors say, “the systematic and paleobotanical value of leaves is ready to increase significantly.”

Careful Preservation and Computing Power

The 5,063 cleared leaf images used in the study now are hosted by the Cleared Leaf Image Database, along with other collections. “Although our project was full speed ahead before the database existed, it is helping us very much by providing a cite-able image archive to make our work more easily reproducible,” said Peter Wilf, a Pennsylvania State University paleobotanist and lead author of the PNAS publication.

The cleared leaves themselves hail from an earlier era, in a collection of Jack A. Wolfe, a U.S. Geological Survey paleobotanist who spent years painstakingly clearing and staining over 5,000 leaves and preserving them on more than 18,000 slides. Wolfe’s collection allowed him to understand leaf vein networks on living plants, and thus better identify fossil plants.

Jack A. Wolfe, a U.S. Geological Survey paleobotanist, spent years preserving more than 5,000 tree leaves to better identify fossil plants. (Image courtesy of Scott Wing)

Jack A. Wolfe, a U.S. Geological Survey paleobotanist, spent years preserving more than 5,000 tree leaves to better identify fossil plants. (Image courtesy of Scott Wing)

”When the collection was made in the late 20th century, no one conceived of the computing power and software sophistication that this study brought to bear on leaf identification,” noted Scott Wing, a co-author on the study and a Smithsonian Institution paleobiologist whose team of volunteers spent years accessioning, housing, curating, and photographing the specimens generated by Wolfe.

“Although this work required new technologies, it would also not have been possible without the time and effort that were devoted to preserving this collection and making it available to the research community,” Wing said. “Even in the wonderful world of high-speed computers and internet-available images, a lot still depends on natural history museums caring for the specimens that underlie the science.” Wing added that the collection used for this study is still in dire need of conservation, as the medium of the slides deteriorates with time.

Said Nirav Merchant, co-principal investigator of CyVerse: “It is heartwarming to know that data now available in the Cleared Leaves Database was used to develop something novel for analysis. I look forward to even larger scale research efforts using the herbarium data CyVerse enables.”

In addition to the Cleared Leaf Image Database, other herbarium data projects have taken advantage of some component of CyVerse cyberinfrastructure through the Powered by CyVerse program. These include the New England Vascular Plant (NEVP) project, which provides data to support studies of the nature and consequences of environmental change in the New England region over the last three centuries, and the South Eastern Regional Network of Expertise and Collection (SERNEC), a network of 230 herbaria in 14 southeastern states.