Macrosystems Biology: How to share, manage and cite big data and team science?

Last month, I participated in the first Principal Investigator meeting of NSF’s new Macrosystems Biology program. The NSF solicits proposals to “support quantitative, interdisciplinary, systems-oriented research on biosphere processes and their complex interactions with climate, land use, and invasive species at regional to continental scales.”

The first groups of projects cover an incredible range of topics, and are embracing a wide range of research approaches. In a pre-meeting survey, projects reported using simulation models, developing new theory, fitting empirical models to multi-scaled data, analyzing paleoecological data and implementing experiments across linked networks ofsites. Almost half the groups reflected the newness of the continental-scale approach by including significant educational activities.

The last half-day of the first Macrosystems Biology PI meeting took place at NEON HQ and packed our largest meeting room to the brim.

The last half-day of the first Macrosystems Biology PI meeting took place at NEON HQ and packed our largest meeting room to the brim.

The meeting as a whole had a rough-and tumble flavor to it, as groups explored new ideas, exchanged ideas between groups and created new ideas from the fusion of each team’s perspectives. There was an unusually intense sense of intellectual ferment. I felt like I was seeing the early stages of a new approach to environmental science emerging.

Although the program is called “macrosystems”, most of the excitement was about working at multiple spatial scales. Researchers were making serious efforts to understand how influences crossed scales. Examples included how regional and global climate affected individual organisms, how local communities (present day and paleo) were affected by large-scale movement of organisms, and how those movements were affected by global climate and how the organismal biology of lakes varies across macroclimatic gradients. Although the NSF call emphasized the continental scale, most of the projects recognized that processes that play out over large areas also take a long time to play out, and so coupled time and space scales.

During the last half-day of the meeting, smaller groups met outside of NEON HQ to talk over ideas for an upcoming special issue of a journal …

Some of the commonalities were methodological. Very few of the projects could answer their questions with data and data analysis alone, and had to integrate theoretical and computational models with observations. Similarly, none of the projects could answer their questions with models alone, so some projects included large scale data-gathering efforts, while others were harvesting vast quantities of existing data from ongoing observations, experiments and data archives. Each project brought a data manager to the meeting, and so interwoven with the scientific discussion was a rich conversation about the new informatics and computation resources that exist or are needed.

Participants also addressed the culture of science. Data sharing was a common theme: scientists interested in studying the continent need to access data across many sites, and barriers to sharing data are barriers to them testing their hypotheses! The macrosystems teams spent a lot of time talking about how to share data technologically and also how credit can be given and shared in enterprises where a few creative individuals have an idea that requires harvesting data from tens or even hundreds of their colleagues. Most of the teams have plans to publish data so that it can be shared, cited and included in academic reward systems.

Collaboration was a related theme, and raised similar issues. How do we build teams that include the breadth of expertise needed to address big problems, allowing each member to contribute technical knowledge and leadership while sharing the credit? How can young scientists work in these exciting teams and still be recognized and promotable? Many macrosystems projects don’t emphasize collecting new data or plan to use data from the NEON facility. This implies a cohort of students who won’t conduct their own fieldwork in support of their dissertations. How will such students be received when they apply for jobs, and for promotion? There were no definitive answers, but a great conversation began, and some of the more senior participants were energized to raise the profile of these issues in their institutions!

Many of these issues will be discussed in a forthcoming special issue of a journal. Stay tuned for an exciting read!