Open2C: Advancing Biological Data Analysis with Open-Source Data Tools

The Open Chromosome Collective (Open2C) is an international team of bioinformaticians that collaboratively develop open-source software tools to facilitate data analysis for 3D chromosome biology and genomics. The tools developed by Open2C have led to the publication of several manuscripts, and are used by other research labs and biotech companies. The Open2C initiative was recently funded by the Google Summer of Code program, which will support three internships to train the next generation of open-source software developers.

In an era where -omics research, from genomics and proteomics to metabolomics and more, produces enormous quantities of information, data analysis tools have become essential for scientific discovery. Data analysis helps scientists make sense of the data, visualize it, and translate it into observations of biological phenomena. 

But who makes these tools in the first place? And how? The Open Chromosome Collective (Open2C) is a collaborative effort that brings together several research groups in Europe and the United States, including the group of Anton Goloborodko at IMBA. The researchers, all sharing a strong expertise in bioinformatics, jointly develop new data analysis software for 3D chromosome biology and genomic data science. Open2C was established in the fall of 2021, although its members had been working together for years before that. The tools developed by Open2C over the last few years are now used by many labs worldwide and important biotechnological companies, pushing the boundaries of data analysis in bioinformatics. 

Spreading the word: From one lab to the world 

The origins of the Open2C collective are pretty humble. “The idea was born between several researchers working at the Mirny Lab at the Massachusetts Institute of Technology,” Anton Goloborodko, Group Leader at IMBA and one of the key members of Open2C, explains. “When I was a PhD student at the Mirny Lab, we started developing our own bioinformatics software for data analysis.”   

This collaborative effort was maintained over the years, even when the lab members eventually moved on to new positions elsewhere. Now, Open2C has members in European countries like Austria, Switzerland and France, as well as California and Boston. “Our initial members also recruited new contacts and colleagues that joined this collaborative effort, expanding our team,” Goloborodko adds.  

Strength in numbers  

When asked why he and his collaborators spend a significant amount of time developing and maintaining these software tools, Goloborodko has no doubts: “After you start coding, you realize coding is just fun. It’s an amazing feeling to program something and have the machine do what you want (most times).” On top of that, the collaborative nature of Open2C means that people with different expertise can contribute and learn from each other. “With an open-source format, collaborators can expand on what you have built or help you improve it,” Goloborodko explains. “The social component of going back-and-forth on different ideas and receiving feedback on your work makes this process really engaging.”   

Global Impact 

The ultimate objective of Open2C is to produce data analysis tools that can help the contributing scientists answer their research questions. “In our case, we’re trying to understand how DNA is folded in the three-dimensional space,” Goloborodko explains. Research by the Goloborodko group and their collaborators provided the blueprints for discovering the cellular machines capable of folding DNA into chromosomes and maintaining chromosome structure. “The tools we developed together with Open2C help us process the massive datasets that we produce and reduce them to a form that’s easier to work with and visualize.”   

In the last months, three papers co-authored by the Goloborodko group were published in Bioinformatics and PLOS Computational Biology. These papers presented new tools developed by the team:  

  • Bioframe is a library that allows researchers to efficiently analyze and manipulate genomic interval data, such as the locations of genes or other features along a reference genome - the bread and butter of genomic analyses.  

  • Cooltools and Pairtools facilitate the processing and analysis of Hi-C data, which allows scientists to study chromosome contacts and, thus, the shape of DNA.  

Thanks to these achievements, Open2C has been selected by the Google Summer of Code program, which supports training a new generation of software developers to develop open-source tools. The program will fund the training of three interns, allowing Open2C members to expand their teams.  

In addition, the tools produced by Open2C reach far beyond their labs. “Other research groups, as well as some biotechnological companies, are using these open-source tools for their research purposes or as a basis upon which to build their own software,” Goloborodko explains.  

In the future, Open2C hopes to expand further and continue to develop state-of-the-art solutions for data analysis. “Hopefully, our work will make a significant contribution to taking the field of DNA structure forward,” says Goloborodko.