ℹ️ Background
In 2011, the National Institutes of Health (NIH) issued a Request for Information (RFI) soliciting community input on priorities, gaps, and opportunities for neuroscience data sharing and infrastructure. The Open Connectome Project submitted this response outlining our vision, current capabilities, and proposed direction for open, scalable brain imaging data management.
1. What is the Open Connectome Project?
The Open Connectome Project (OCP) is a community-driven initiative to provide public access to high-resolution neuroanatomical data that can be used to explore connectomes — the complete mapping of neural connections within a brain. We launched in early 2011 when the Bock et al. electron microscopy dataset from Harvard University was made available to us, and we immediately hosted it online for worldwide access.
The project is run out of the Department of Applied Mathematics and Statistics and the Department of Computer Science at Johns Hopkins University, led by Joshua T. Vogelstein and Randal Burns. We do not collect our own data — instead, we host whatever neuroanatomical data researchers wish to share with the world.
Our core belief: the laws of physics, chemistry, and biology affect everyone on the planet equally. Scientific data should be equally accessible to all — regardless of institutional affiliation, funding, or geography.
2. The Problem: A Data Deluge Without Infrastructure
Recent technological advances — particularly serial block-face electron microscopy (pioneered by Winfried Denk in 2004) — have enabled the collection of neuroanatomical data at rates previously unimaginable. Modern instruments capture on the order of 10 terabytes (TB) of data per day. The raw data from a single publication (e.g., Bock et al. 2011) requires ~40 TB of storage alone.
The field currently lacks the infrastructure to:
- Store, index, and serve petabyte-scale volumetric brain data efficiently
- Provide low-latency programmatic access to arbitrary spatial subvolumes
- Enable community-wide annotation and analysis of shared datasets
- Incentivize researchers to share raw data before or immediately after publication
- Support automated computer vision pipelines operating on live, shared data
Currently, it takes approximately one human expert workday to trace a single synapse from source to target. At that rate, it would take over 30 million years for a single person to manually examine all synapses in a human brain even once. Automated, community-scale annotation is not optional — it is essential.
3. Our Current Infrastructure & Capabilities
The OCP has already deployed a production system capable of serving large-scale connectomics data. Our architecture inherits from NoSQL scale-out and data-intensive computing paradigms:
Distributed Spatial Database
Data distributed across cluster nodes by partitioning a spatial index. Reads routed to parallel disk arrays; writes to solid-state storage to maximize throughput and minimize I/O contention.
RESTful Web Services
All interfaces are stateless RESTful APIs. Supports 3D image cutouts, annotation reads/writes, batch operations, metadata queries, and bounding box lookups at any resolution level.
Web Visualization (CATMAID)
Integration with CATMAID (Cardona et al.) enables browser-based navigation of arbitrarily large 3D neuroanatomical volumes — accessible to any researcher in the world without software installation.
RAMON Annotation Schema
Rich Annotation with Metadata Object Notation (RAMON) provides a unified schema for representing neurons, synapses, organelles, and arbitrary annotation objects with full metadata support.
Example API Endpoints
4. Our Response to NIH Priorities
4.1 On Data Sharing & Open Access
We believe the NIH should fund infrastructure that removes activation energy from data sharing rather than relying on researcher goodwill alone. Our model: data is automatically piped from the microscope to the Hopkins servers, ingested into our spatial database, and available to a small group of collaborators. Making that data publicly accessible then requires nothing more than flipping a switch.
We propose that NIH mandate and fund the following for all supported neuroscience projects: data must be deposited in an approved open repository (such as OCP) within 12 months of collection, with programmatic API access provided immediately upon deposition.
4.2 On "Alg-Sourcing" — Community Annotation at Scale
The analogy to "crowdsourcing" but for algorithms: we propose funding a community effort to develop, benchmark, and deploy automated computer vision algorithms against shared connectomics datasets. Rather than each lab re-implementing the same image segmentation pipelines, the community should collaborate on a single, openly available, continuously benchmarked codebase — operating against OCP-hosted data.
This model mirrors how astrophysics has operated for decades via the Sloan Digital Sky Survey (SDSS): one shared dataset, thousands of independent analyses, massive scientific leverage from a single infrastructure investment.
4.3 On Multi-Modal Integration
Connectomics must span spatial scales: from synaptic EM (~nanometers) to whole-brain fMRI (~millimeters). We have begun collaboration with Michael Milham (founder of the International Neuroimaging Data-sharing Initiative) to build scalable databases for multimodal MRI data alongside our existing EM infrastructure. Our goal is one-click upload: any researcher can deposit their MRI data, have it automatically processed and incorporated, and immediately compare against existing datasets.
The NIH should fund a unified informatics layer that treats all scales and modalities of brain data as first-class citizens within a single queryable ecosystem.
4.4 On Compute Infrastructure
We strongly support NIH investment in co-located compute alongside open data repositories. The bottleneck in connectomics is not storage — it is analysis. Raw EM data at 40 TB per paper becomes tractable only with compute infrastructure adjacent to the data, avoiding the network bottleneck of downloading to local clusters. We advocate for a national "brain compute" facility analogous to the XSEDE national supercomputing network, but purpose-built for spatial neuroscience workloads.
5. Requested NIH Actions
Fund Open Data Infrastructure
Directly fund the development and maintenance of open neuroscience data repositories, including OCP, with dedicated infrastructure grants separate from research project funding.
Mandate Data Deposition
Require all NIH-funded neuroscience projects collecting structural or functional brain imaging data to deposit raw data in an approved open repository within 12 months of acquisition.
Fund Alg-Sourcing Competitions
Create a program analogous to CASP (protein structure prediction) or the Netflix Prize for connectomics algorithm development — with shared benchmarks, open leaderboards, and funded prizes.
Co-locate Compute with Data
Fund compute facilities adjacent to major neuroscience data repositories to eliminate network-transfer bottlenecks for large-scale analysis.
Support Cross-Scale Integration
Fund development of informatics tools that integrate data across spatial and temporal scales — from synaptic EM to whole-brain MRI — within a unified, queryable platform.
Incentivize Pre-Publication Sharing
Provide supplemental funding to research groups that share data prior to publication, with priority given to projects that make data programmatically accessible via standardized APIs.
6. Why This Matters Now
The digital age has fundamentally changed the economics of scientific data. Whether instruments count photons, electrons, or bosons, the output is digital — and digital data can be transmitted and shared at near-zero marginal cost. The scientific community has an obligation to take advantage of this.
Astrophysics demonstrated this with the Sloan Digital Sky Survey. Genetics demonstrated it with GenBank. Both fields were transformed — not just accelerated — by the decision to make raw data universally accessible. Neuroscience is next.
The US President's BRAIN Initiative ($100M, announced 2013) underscores that neuroscience is a national priority. But data collection without open, scalable infrastructure for that data is like building highways with no on-ramps. The Open Connectome Project proposes to be that infrastructure.
"We hope these services will sufficiently incentivize people to share their data. Science is about continually improving our collective descriptions of how the universe works. Why would we not want to share the beautiful data and explanations that science reveals with everybody?"— Joshua T. Vogelstein, Open Connectome Project, Neural Systems & Circuits 2011
7. Contact & Further Information
For questions about this response, our current infrastructure, or potential collaborations:
Joshua T. Vogelstein
Principal Investigator
Department of Applied Mathematics & Statistics
Johns Hopkins University
Baltimore, MD 21218
Randal Burns
Co-Investigator
Department of Computer Science
Johns Hopkins University
Baltimore, MD 21218
This response was originally published at openconnectomeproject.org/nih-rfi (c. 2011) and is cited in: Vogelstein JT. Q&A: What is the Open Connectome Project? Neural Systems & Circuits 1:16, 2011. doi:10.1186/2042-1001-1-16