The RMBL Knowledge Commons is a unified search and discovery platform for environmental research at the Rocky Mountain Biological Laboratory in Gothic, Colorado. It connects scientific publications, community documents, research datasets, news stories, and a knowledge graph of species, concepts, protocols, and places studied at one of the longest-running field biology stations in North America.
The Knowledge Commons is a search and discovery tool that brings together the scientific output of RMBL and the Gunnison Basin into one searchable platform. It includes peer-reviewed publications dating back to 1928, community and policy documents from the Sustainable Living Library, and research datasets from multiple repositories. A knowledge graph connects these resources through shared species, concepts, research methods, and geographic locations.
The Hub is designed for researchers, students, land managers, community members, and policymakers interested in the environmental research and stewardship of the Gunnison Basin. It is equally useful for scientists looking for related work and for community members exploring how research connects to local policy issues.
Knowledge Neighborhoods are research communities detected automatically by analyzing the connections in the knowledge graph. Using a community-detection algorithm (Louvain), the system identifies clusters of tightly connected authors, publications, species, concepts, and places. Each neighborhood represents a distinct research theme — from marmot behavioral ecology to watershed biogeochemistry to federal land management policy. Many neighborhoods include AI-generated research primers that summarize the key findings and cite specific publications.
Research Frontiers are synthesized boundaries between what scientists know and what they don't, with identifiable paths to push the boundary forward. The system extracts atomic gap-statements from neighborhood research primers, clusters them by semantic similarity, and uses a language model to weave each cluster into a narrative with context, key questions, barriers, opportunities, and concrete actions categorized by category (data, experiment, model, synthesis, framework, etc.) and effort tier (near-term, ambitious, major, consortium). Each frontier links back to its contributing neighborhoods, source statements, and the strongest concepts, species, places, and protocols involved — so you can trace any claim back to the underlying evidence.
The Knowledge Commons can be queried by AI assistants via the REST API or the MCP (Model Context Protocol) server. This allows tools like Claude Desktop, ChatGPT, and custom scripts to search publications, explore research neighborhoods, and access the knowledge graph programmatically.
All API endpoints are at /api/v1/ and support ?format=text for LLM-friendly plain text. See /llms.txt for a complete list. Examples:
# Search for publications about alpine pollination curl "https://rmblknowledgecommons.org/api/v1/search?q=alpine+pollination&format=text" # Get publication details curl "https://rmblknowledgecommons.org/api/v1/publications/13?format=text" # Explore a research neighborhood with primer curl "https://rmblknowledgecommons.org/api/v1/neighborhoods/620?format=text" # Look up a species curl "https://rmblknowledgecommons.org/api/v1/entities/species/8426?format=text" # Find related works curl "https://rmblknowledgecommons.org/api/v1/related/publications/13?format=text"
The easiest way to connect: add the Knowledge Commons as a Custom Connector in Claude Desktop. No installation required — just a URL.
Option A: Remote connector (no install):
https://rmblknowledgecommons.org/api/mcpOption B: Local server (for development):
git clone https://github.com/ikb-rmbl/RMBL_knowledge_hub.git cd RMBL_knowledge_hub/mcp npm install && npm run build
Then add to Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"rmbl-knowledge-commons": {
"command": "node",
"args": ["/path/to/RMBL_knowledge_hub/mcp/dist/index.js"],
"env": {
"RMBL_API_URL": "https://rmblknowledgecommons.org"
}
}
}
}The sections below describe how data flows into the Knowledge Commons and how the knowledge graph is constructed.
Publications are sourced from the RMBL publications database, with additional discovery via OpenAlex and CrossRef. Each record is enriched with metadata from CrossRef (authors, DOIs, abstracts, citation counts) and Unpaywall (open access links). Full text is extracted from PDFs using pdftotext with OCR fallback via Tesseract.
Datasets are discovered from eight repository sources including EDI, DataONE, Dryad, Zenodo, USGS ScienceBase, Pangaea, NCBI, and Figshare. Each dataset is enriched with EML/DataCite metadata including temporal and spatial coverage, creator information, and licensing.
Documents come from the Sustainable Living Library, a collection of community and policy documents relevant to the Gunnison Basin. These include management plans, environmental impact statements, water quality reports, and local planning documents.
Stories are news articles about RMBL and the Gunnison Basin from local newspapers (Crested Butte News, Gunnison Country Times) and national/international outlets via LexisNexis. Full text is stored for search indexing and entity extraction but is not displayed on detail pages to respect copyright. Each story links to its original source when available.
Authors are deduplicated across all collections using a two-phase process. First, authors with matching ORCID identifiers are merged. Then, authors sharing the same family name are compared by given name initials, with checks to prevent false merges when middle initials differ (e.g., “R. J. Smith” is kept separate from “R. A. Smith”). Author ordering on publications is repaired from CrossRef metadata to ensure correct first-author attribution.
Entities (species, concepts, protocols, places, and stakeholders) are extracted from publication and document full text using Claude vision models (VLM extraction). Each entity mention is linked to its source item with a confidence score and extraction method. Entities are then deduplicated using embedding-based clustering (Voyage AI voyage-4, 1024 dimensions) with type-specific similarity thresholds.
The Knowledge Commons is an evolving platform and we welcome feedback from the community. If you notice missing publications, incorrect data, broken links, or have ideas for new features, there are two ways to get in touch:
The RMBL Knowledge Commons was developed with support from the Clark Family Foundation. Built by RMBL using data from CrossRef, OpenAlex, Unpaywall, ITIS, GNIS, and multiple data repositories.
The Hub provides a REST API at /api/v1/ with endpoints for search, publication detail, entity lookup, related works, and more. Add ?format=text to any endpoint for LLM-friendly plain text. For AI assistants like Claude Desktop, an MCP server is available — see the MCP documentation for setup instructions. See /llms.txt for a machine-readable index of available endpoints.
Every detail page has a “Report an issue” link below the title. Click it to flag a record that has incorrect data, is a duplicate, is missing information, or has other problems. You can describe what’s wrong and suggest corrections — no account needed.
Flags are reviewed by RMBL administrators through the Payload CMS admin panel. You can optionally include your email address if you’d like to be notified when the issue is resolved.
For technical issues with the site itself (bugs, broken features), please submit an issue on the GitHub repository.
Try asking:
Note: The MCP server currently supports Claude Desktop and other clients that use the Streamable HTTP transport. OpenAI/ChatGPT requires the older SSE transport with long-lived connections, which is not compatible with our serverless hosting. We plan to add OpenAI support when they adopt the Streamable HTTP standard. In the meantime, ChatGPT users can access the same data via the REST API with ?format=text.
| Tool | Description |
|---|---|
| search_rmbl | Full-text search across all collections |
| get_publication | Publication detail with authors, abstract, entities, citations |
| get_dataset | Dataset detail with creators and entities |
| get_document | Document detail with entities and stakeholders |
| get_entity | Entity lookup (species, concept, protocol, place, stakeholder) |
| find_related | Related works via semantic similarity, shared entities, co-authorship, citations |
| explore_neighborhood | Research neighborhood detail with primer |
| list_neighborhoods | Browse or search 154 research neighborhoods |
| get_frontier | Research frontier detail: questions, actions, data gaps, source statements |
| list_frontiers | Browse or search synthesized research frontiers (sortable by breadth/leverage) |
Species names are validated against the ITIS (Integrated Taxonomic Information System) database. Places are enriched with coordinates from GNIS (Geographic Names Information System) and organized into a parent-child hierarchy.
The resulting knowledge graph has 132,104 entity mentions linking items to entities, plus 143,289 citation references with internal cross-links between publications.
Knowledge Neighborhoods are detected using the Louvain community detection algorithm on the unified knowledge graph. The graph includes all entities and items as nodes, with edges from co-occurrence in publications, co-authorship, and citations. Edge weights are boosted for structural relationships (co-authorship ×5, citations ×3) to ensure that social and citation structure drives community boundaries rather than just shared terminology.
Research primers are generated for the largest neighborhoods using Claude (Opus model) with tiered context assembly: landmark papers (full abstracts + key findings), frontier papers (2020+), breadth papers (single best finding each), and entity context (species, concepts, methods, places). Each primer includes parenthetical citations linked to specific publications in the Hub. Policy-focused neighborhoods receive primers with document citations instead.
Full-text search uses PostgreSQL tsvector with weighted ranking (title > abstract > full text) and stemmed query matching. Search results include highlighted snippets via ts_headline.
Related works are found using four signals: semantic similarity (pgvector cosine distance on Voyage AI embeddings), shared entity mentions (at least 3 shared entities), co-authorship (shared authors across publications), and citation links (from the references_cited table). Signals are merged with a multi-signal bonus for items connected by multiple pathways.
The Knowledge Commons is built with Next.js and Payload CMS on PostgreSQL with pgvector. Graph visualizations use Sigma.js (WebGL). The data pipeline is a set of TypeScript scripts for scraping, enrichment, entity extraction, and graph construction. Vector embeddings are generated by Voyage AI (voyage-4, 1024 dimensions). The site is hosted on Vercel with the database on Neon (serverless PostgreSQL).
The project is open source at github.com/ikb-rmbl/RMBL_knowledge_hub.