📄 Official Response · 2011

OCP Response to NIH Request for Information

The Open Connectome Project's formal response to the NIH Request for Information on neuroscience data sharing infrastructure and priorities — submitted by Joshua T. Vogelstein, Johns Hopkins University.

ℹ️ Background

In 2011, the National Institutes of Health (NIH) issued a Request for Information (RFI) soliciting community input on priorities, gaps, and opportunities for neuroscience data sharing and infrastructure. The Open Connectome Project submitted this response outlining our vision, current capabilities, and proposed direction for open, scalable brain imaging data management.

1. What is the Open Connectome Project?

The Open Connectome Project (OCP) is a community-driven initiative to provide public access to high-resolution neuroanatomical data that can be used to explore connectomes — the complete mapping of neural connections within a brain. We launched in early 2011 when the Bock et al. electron microscopy dataset from Harvard University was made available to us, and we immediately hosted it online for worldwide access.

The project is run out of the Department of Applied Mathematics and Statistics and the Department of Computer Science at Johns Hopkins University, led by Joshua T. Vogelstein and Randal Burns. We do not collect our own data — instead, we host whatever neuroanatomical data researchers wish to share with the world.

Our core belief: the laws of physics, chemistry, and biology affect everyone on the planet equally. Scientific data should be equally accessible to all — regardless of institutional affiliation, funding, or geography.

2. The Problem: A Data Deluge Without Infrastructure

Recent technological advances — particularly serial block-face electron microscopy (pioneered by Winfried Denk in 2004) — have enabled the collection of neuroanatomical data at rates previously unimaginable. Modern instruments capture on the order of 10 terabytes (TB) of data per day. The raw data from a single publication (e.g., Bock et al. 2011) requires ~40 TB of storage alone.

The field currently lacks the infrastructure to:

  • Store, index, and serve petabyte-scale volumetric brain data efficiently
  • Provide low-latency programmatic access to arbitrary spatial subvolumes
  • Enable community-wide annotation and analysis of shared datasets
  • Incentivize researchers to share raw data before or immediately after publication
  • Support automated computer vision pipelines operating on live, shared data

Currently, it takes approximately one human expert workday to trace a single synapse from source to target. At that rate, it would take over 30 million years for a single person to manually examine all synapses in a human brain even once. Automated, community-scale annotation is not optional — it is essential.

3. Our Current Infrastructure & Capabilities

The OCP has already deployed a production system capable of serving large-scale connectomics data. Our architecture inherits from NoSQL scale-out and data-intensive computing paradigms:

🗄️

Distributed Spatial Database

Data distributed across cluster nodes by partitioning a spatial index. Reads routed to parallel disk arrays; writes to solid-state storage to maximize throughput and minimize I/O contention.

🔌

RESTful Web Services

All interfaces are stateless RESTful APIs. Supports 3D image cutouts, annotation reads/writes, batch operations, metadata queries, and bounding box lookups at any resolution level.

👁️

Web Visualization (CATMAID)

Integration with CATMAID (Cardona et al.) enables browser-based navigation of arbitrarily large 3D neuroanatomical volumes — accessible to any researcher in the world without software installation.

📝

RAMON Annotation Schema

Rich Annotation with Metadata Object Notation (RAMON) provides a unified schema for representing neurons, synapses, organelles, and arbitrary annotation objects with full metadata support.

Example API Endpoints

GET /token/hdf5/4/512,1024/512,1024/512,1024/ — 3D image cutout at resolution 4
GET /annoproj/75/voxels/ — read annotation voxel list
GET /annoproj/75/boundingbox/ — annotation bounding box
GET /annoproj/objects/type/synapse/ — query all synapses by type
POST /annoproj/batch/ — batch write annotations (40× throughput improvement)

4. Our Response to NIH Priorities

4.1 On Data Sharing & Open Access

We believe the NIH should fund infrastructure that removes activation energy from data sharing rather than relying on researcher goodwill alone. Our model: data is automatically piped from the microscope to the Hopkins servers, ingested into our spatial database, and available to a small group of collaborators. Making that data publicly accessible then requires nothing more than flipping a switch.

We propose that NIH mandate and fund the following for all supported neuroscience projects: data must be deposited in an approved open repository (such as OCP) within 12 months of collection, with programmatic API access provided immediately upon deposition.

4.2 On "Alg-Sourcing" — Community Annotation at Scale

The analogy to "crowdsourcing" but for algorithms: we propose funding a community effort to develop, benchmark, and deploy automated computer vision algorithms against shared connectomics datasets. Rather than each lab re-implementing the same image segmentation pipelines, the community should collaborate on a single, openly available, continuously benchmarked codebase — operating against OCP-hosted data.

This model mirrors how astrophysics has operated for decades via the Sloan Digital Sky Survey (SDSS): one shared dataset, thousands of independent analyses, massive scientific leverage from a single infrastructure investment.

4.3 On Multi-Modal Integration

Connectomics must span spatial scales: from synaptic EM (~nanometers) to whole-brain fMRI (~millimeters). We have begun collaboration with Michael Milham (founder of the International Neuroimaging Data-sharing Initiative) to build scalable databases for multimodal MRI data alongside our existing EM infrastructure. Our goal is one-click upload: any researcher can deposit their MRI data, have it automatically processed and incorporated, and immediately compare against existing datasets.

The NIH should fund a unified informatics layer that treats all scales and modalities of brain data as first-class citizens within a single queryable ecosystem.

4.4 On Compute Infrastructure

We strongly support NIH investment in co-located compute alongside open data repositories. The bottleneck in connectomics is not storage — it is analysis. Raw EM data at 40 TB per paper becomes tractable only with compute infrastructure adjacent to the data, avoiding the network bottleneck of downloading to local clusters. We advocate for a national "brain compute" facility analogous to the XSEDE national supercomputing network, but purpose-built for spatial neuroscience workloads.

5. Requested NIH Actions

01

Fund Open Data Infrastructure

Directly fund the development and maintenance of open neuroscience data repositories, including OCP, with dedicated infrastructure grants separate from research project funding.

02

Mandate Data Deposition

Require all NIH-funded neuroscience projects collecting structural or functional brain imaging data to deposit raw data in an approved open repository within 12 months of acquisition.

03

Fund Alg-Sourcing Competitions

Create a program analogous to CASP (protein structure prediction) or the Netflix Prize for connectomics algorithm development — with shared benchmarks, open leaderboards, and funded prizes.

04

Co-locate Compute with Data

Fund compute facilities adjacent to major neuroscience data repositories to eliminate network-transfer bottlenecks for large-scale analysis.

05

Support Cross-Scale Integration

Fund development of informatics tools that integrate data across spatial and temporal scales — from synaptic EM to whole-brain MRI — within a unified, queryable platform.

06

Incentivize Pre-Publication Sharing

Provide supplemental funding to research groups that share data prior to publication, with priority given to projects that make data programmatically accessible via standardized APIs.

6. Why This Matters Now

The digital age has fundamentally changed the economics of scientific data. Whether instruments count photons, electrons, or bosons, the output is digital — and digital data can be transmitted and shared at near-zero marginal cost. The scientific community has an obligation to take advantage of this.

Astrophysics demonstrated this with the Sloan Digital Sky Survey. Genetics demonstrated it with GenBank. Both fields were transformed — not just accelerated — by the decision to make raw data universally accessible. Neuroscience is next.

The US President's BRAIN Initiative ($100M, announced 2013) underscores that neuroscience is a national priority. But data collection without open, scalable infrastructure for that data is like building highways with no on-ramps. The Open Connectome Project proposes to be that infrastructure.

"We hope these services will sufficiently incentivize people to share their data. Science is about continually improving our collective descriptions of how the universe works. Why would we not want to share the beautiful data and explanations that science reveals with everybody?"
— Joshua T. Vogelstein, Open Connectome Project, Neural Systems & Circuits 2011

7. Contact & Further Information

For questions about this response, our current infrastructure, or potential collaborations:

👤

Joshua T. Vogelstein

Principal Investigator
Department of Applied Mathematics & Statistics
Johns Hopkins University
Baltimore, MD 21218

👤

Randal Burns

Co-Investigator
Department of Computer Science
Johns Hopkins University
Baltimore, MD 21218

This response was originally published at openconnectomeproject.org/nih-rfi (c. 2011) and is cited in: Vogelstein JT. Q&A: What is the Open Connectome Project? Neural Systems & Circuits 1:16, 2011. doi:10.1186/2042-1001-1-16