An intelligent tab and bookmark management system with AI-powered content analysis, semantic search, smart suggestions, and automated archival.
Tab & Bookmark Manager is a full-stack productivity tool that transforms how users interact with their browser tabs and bookmarks. Rather than treating tabs and bookmarks as flat, unstructured lists, this system applies machine learning to understand the content behind each URL: summarizing pages, classifying them into categories, detecting duplicates, identifying stale tabs, and surfacing related content through semantic similarity. The result is a self-organizing knowledge layer on top of everyday browsing behavior.
The system ships as three coordinated services — a Chrome/Edge browser extension for capture, a Node.js REST API for orchestration and persistence, and a Python Flask ML service for natural language processing — backed by PostgreSQL with pgvector for vector similarity search, Redis for caching and job queues, and Puppeteer for full-page archival. Everything runs containerized via Docker Compose for reproducible local development and production deployment.
This repository belongs to ORGAN-III (Ergon), the Commerce organ of the organvm ecosystem, which houses SaaS products, B2B/B2C tools, and developer utilities that generate practical value.
Tab & Bookmark Manager addresses a universal problem: browser tab overload and bookmark rot. Most users accumulate dozens of open tabs and hundreds of unsorted bookmarks with no way to search by meaning, detect redundancies, or automatically archive content before it disappears. Existing browser bookmark managers are purely mechanical — they store URLs and titles, nothing more.
This system goes further by treating every captured URL as a document that can be analyzed, classified, embedded into a vector space, and cross-referenced against everything else in the user’s collection. The core value proposition is threefold:
Automatic intelligence. Every tab and bookmark is analyzed in the background — summarized, classified into one of ten content categories, tagged with extracted entities and keywords, and embedded as a 384-dimensional vector for similarity search. The user does nothing; the system does the thinking.
Proactive suggestions. A scheduled suggestion engine continuously scans the collection for duplicates (content-level, not just URL-level), identifies tabs that have gone stale (open but unvisited for configurable periods), and surfaces related content the user may have forgotten about. Each suggestion carries a confidence score and supports an accept/reject workflow.
Permanent archival. Web pages are ephemeral — links rot, content changes, sites go offline. The archive system uses Puppeteer to capture full HTML content, screenshots (PNG), and PDF renderings of any page, preserving a permanent local copy independent of the live web.
The browser is the most-used application on most computers, yet its built-in organizational tools have barely evolved in twenty years. Bookmark folders are hierarchical (forcing single-category classification), search is keyword-only (missing semantic relationships), and there is no concept of “this tab is stale” or “these three bookmarks are about the same topic.” Tab & Bookmark Manager fills that gap by layering AI-powered content understanding on top of the browser’s native primitives.
This project also serves as a technical demonstration of how to integrate a browser extension frontend, a Node.js orchestration backend, and a Python ML microservice into a cohesive product with real-time background processing, vector search, and scheduled automation — patterns that apply broadly to any content-intelligence application.
The system follows a microservices architecture with four main components communicating over HTTP and backed by shared data stores.
┌──────────────────────────┐
│ Browser Extension │
│ (Chrome/Edge MV3) │
│ Capture + UI │
└────────────┬─────────────┘
│ HTTP/REST
▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Backend API │◄────────┤ ML Service │
│ Node.js / Express │ HTTP │ Python / Flask │
│ │ │ │
│ - Auth (JWT) │ │ - Summarization (BART) │
│ - CRUD (tabs/bookmarks) │ │ - Classification │
│ - Search (text+vector) │ │ - NER (spaCy) │
│ - Suggestions │ │ - Embeddings (MiniLM) │
│ - Archive orchestration │ │ - Keyword extraction │
│ - Automation scheduler │ └──────────────────────────┘
└─────┬──────┬──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────┐ ┌───────────┐
│PostgreSQL│ │Redis│ │ Puppeteer │
│(pgvector)│ │ │ │ (archive) │
│ │ │Cache│ │ HTML/PNG/ │
│ Tabs │ │Queue│ │ PDF │
│ Bookmarks│ │Bull │ └───────────┘
│ Archives │ └─────┘
│ Vectors │
└──────────┘
Browser Extension (Manifest V3). Runs as a Chrome/Edge extension using the Manifest V3 service worker model. Captures tab events (creation, update, close) and bookmark events automatically. Provides a popup UI with search, suggestion display, and usage statistics. Content scripts extract page text for analysis.
Backend API (Node.js/Express). The central orchestration layer. Receives data from the extension, persists it in PostgreSQL, dispatches analysis jobs to the ML service via Redis-backed Bull queues, manages the suggestion lifecycle, and coordinates archival via Puppeteer. Exposes a full REST API with Swagger/OpenAPI documentation. Implements JWT authentication, rate limiting (100 requests per 15 minutes), Helmet security headers, CORS, Joi input validation, and Winston structured logging.
ML Service (Python/Flask). A dedicated NLP microservice that runs five analysis pipelines: text summarization using Facebook’s BART-large-CNN model, content classification across ten categories (Technology, News, Education, Entertainment, Business, Social, Shopping, Health, Science, Other), named entity recognition using spaCy’s en_core_web_sm model, 384-dimensional semantic embeddings using Sentence Transformers’ all-MiniLM-L6-v2 model, and keyword extraction using TF-IDF via scikit-learn. Each pipeline is available as an individual endpoint or through a single comprehensive analysis endpoint.
Data Stores. PostgreSQL 15 with the pgvector extension stores all tab and bookmark metadata alongside their vector embeddings, enabling efficient cosine-similarity queries for semantic search and duplicate detection. Redis 7 serves as both a cache layer and the backing store for Bull job queues that handle content analysis, archival, and suggestion generation asynchronously.
Capture flow: Browser event triggers extension listener, which POSTs tab/bookmark data to the Backend API. The API persists the record in PostgreSQL and enqueues a content-analysis job on the Bull queue. A queue worker sends the page content to the ML Service, receives analysis results (summary, category, entities, keywords, embedding), and updates the database record with enriched metadata and the vector embedding.
Search flow: User enters a query in the extension popup. The Backend API sends the query text to the ML Service to generate an embedding vector, then performs a pgvector cosine-similarity search against all stored embeddings. Results are ranked by similarity score and returned to the extension for display.
Suggestion flow: A node-cron scheduled job runs every 6 hours, triggering the suggestion service. The service queries PostgreSQL for potential duplicates (high cosine similarity between different records), stale tabs (open but unaccessed beyond a threshold), and related content clusters. Generated suggestions are stored with confidence scores and presented to the user for accept/reject decisions.
Archive flow: User triggers archival for a specific tab or bookmark (or the automation engine triggers it for old tabs weekly). The Backend API launches Puppeteer in a headless browser, navigates to the URL, and captures three artifacts: full HTML content, a PNG screenshot, and a PDF rendering. These files are stored on the local filesystem, and the archive metadata is persisted in PostgreSQL.
The fastest way to get all services running.
# Clone the repository
git clone https://github.com/organvm-iii-ergon/tab-bookmark-manager.git
cd tab-bookmark-manager
# Run the setup script (starts all containers)
./scripts/setup.sh
This starts all four services:
| Service | URL | Purpose |
|---|---|---|
| Backend API | http://localhost:3000 | REST API + Swagger |
| ML Service | http://localhost:5000 | NLP analysis |
| PostgreSQL | localhost:5432 | Database |
| Redis | localhost:6379 | Cache + job queue |
The ML service will download pretrained models on first startup (BART-large-CNN, all-MiniLM-L6-v2, spaCy en_core_web_sm). This may take several minutes depending on your connection.
For active development, run services individually.
# Run the development setup script
./scripts/dev-setup.sh
# Start PostgreSQL and Redis via Docker
docker compose up -d postgres redis
# Start the backend (in one terminal)
cd backend
npm install
npm run dev # Runs on http://localhost:3000
# Start the ML service (in another terminal)
cd ml-service
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python src/app.py # Runs on http://localhost:5000
After first launch, run the migration script to set up the PostgreSQL schema and pgvector extension:
./scripts/migrate-db.sh
Check that all services are healthy:
# Backend health
curl http://localhost:3000/health
# ML service health
curl http://localhost:5000/health
Expected response:
{
"status": "ok",
"services": {
"api": "healthy",
"mlService": "healthy"
}
}
The browser extension is a Manifest V3 Chrome/Edge extension that serves as the primary user interface.
chrome://extensions/ (or edge://extensions/).extension/ directory from this repository.tabs, bookmarks, storage, activeTab, and scripting permissions. Host permissions cover localhost:3000 for API communication and <all_urls> for content extraction.Every captured URL is automatically analyzed by the ML service pipeline:
Go beyond keyword matching. The semantic search endpoint embeds your query into the same vector space as your stored content, then uses pgvector’s cosine-similarity operator to find the most semantically relevant results. Searching for “machine learning tutorials” will surface bookmarks about “deep learning courses” and “neural network guides” even if those exact words never appear in the query.
The suggestion engine runs on a configurable schedule (default: every 6 hours) and generates three types of actionable suggestions:
Each suggestion supports an accept/reject workflow. Accepted suggestions can trigger automated actions (merge duplicates, close stale tabs, create bookmark groups). Rejected suggestions train the system to avoid similar false positives.
Web content is ephemeral. The archive system preserves pages permanently using Puppeteer headless browser rendering:
All heavy operations run asynchronously through Redis-backed Bull queues:
Jobs include automatic retry logic, error handling, and dead-letter queuing for failed operations.
The node-cron-powered automation engine runs five scheduled tasks:
| Task | Schedule | Description |
|---|---|---|
| Suggestion generation | Every 6 hours | Generate new AI suggestions |
| Suggestion cleanup | Daily | Remove expired or stale suggestions |
| Old tab archival | Weekly | Archive tabs open longer than threshold |
| Statistics update | Hourly | Refresh collection analytics |
| Duplicate check | Every 12 hours | Scan for new content-level duplicates |
Interactive Swagger/OpenAPI documentation is available at http://localhost:3000/api-docs when the backend is running.
The API uses JWT-based authentication. Register a user, login to receive a token, and include it in subsequent requests:
# Register
curl -X POST http://localhost:3000/api/auth/register \
-H "Content-Type: application/json" \
-d '{"username":"demo","email":"demo@example.com","password":"securePass123"}'
# Login
curl -X POST http://localhost:3000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"demo@example.com","password":"securePass123"}'
# Returns: {"token": "eyJhbG..."}
# Use token in subsequent requests
curl http://localhost:3000/api/tabs \
-H "Authorization: Bearer eyJhbG..."
| Group | Method | Endpoint | Description |
|---|---|---|---|
| Auth | POST | /api/auth/register |
Register new user |
| Auth | POST | /api/auth/login |
Login, receive JWT |
| Auth | POST | /api/auth/logout |
Logout, revoke token |
| User | GET | /api/user/profile |
Get profile |
| User | PUT | /api/user/profile |
Update username |
| User | PUT | /api/user/email |
Update email |
| User | PUT | /api/user/password |
Change password |
| User | DELETE | /api/user/account |
Delete account |
| Tabs | POST | /api/tabs |
Create tab |
| Tabs | POST | /api/tabs/bulk |
Bulk create tabs |
| Tabs | GET | /api/tabs |
List all tabs |
| Tabs | GET | /api/tabs/:id |
Get specific tab |
| Tabs | PUT | /api/tabs/:id |
Update tab |
| Tabs | DELETE | /api/tabs/:id |
Delete tab |
| Tabs | POST | /api/tabs/:id/archive |
Archive tab |
| Tabs | GET | /api/tabs/stale/detect |
Detect stale tabs |
| Bookmarks | POST | /api/bookmarks |
Create bookmark |
| Bookmarks | POST | /api/bookmarks/bulk |
Bulk create bookmarks |
| Bookmarks | GET | /api/bookmarks |
List all bookmarks |
| Bookmarks | GET | /api/bookmarks/:id |
Get specific bookmark |
| Bookmarks | PUT | /api/bookmarks/:id |
Update bookmark |
| Bookmarks | DELETE | /api/bookmarks/:id |
Delete bookmark |
| Bookmarks | POST | /api/bookmarks/:id/archive |
Archive bookmark |
| Search | POST | /api/search/semantic |
Semantic vector search |
| Search | GET | /api/search/text |
Text-based search |
| Search | GET | /api/search/similar/:id |
Find similar items |
| Suggestions | GET | /api/suggestions |
List all suggestions |
| Suggestions | GET | /api/suggestions/duplicates |
Duplicate suggestions |
| Suggestions | GET | /api/suggestions/stale |
Stale tab suggestions |
| Suggestions | GET | /api/suggestions/related/:id |
Related content |
| Suggestions | POST | /api/suggestions/generate |
Trigger suggestion generation |
| Suggestions | PUT | /api/suggestions/:id/accept |
Accept suggestion |
| Suggestions | PUT | /api/suggestions/:id/reject |
Reject suggestion |
| Archive | POST | /api/archive |
Archive a page |
| Archive | GET | /api/archive/:id |
Retrieve archived page |
| Archive | GET | /api/archive |
List all archives |
| Health | GET | /health |
Service health check |
The API enforces a rate limit of 100 requests per 15-minute window per IP address. Exceeding this limit returns HTTP 429 (Too Many Requests).
The ML service is a standalone Python/Flask microservice responsible for all natural language processing. It loads models once at startup and serves inference requests over HTTP.
| Capability | Model | Output |
|---|---|---|
| Summarization | facebook/bart-large-cnn | 50-150 word summary |
| Embeddings | all-MiniLM-L6-v2 | 384-dim float vector |
| Named entities | spaCy en_core_web_sm | Entity type/value pairs |
| Classification | Keyword-based heuristics | One of 10 categories |
| Keywords | TF-IDF (scikit-learn) | Top N keywords |
# Comprehensive analysis (runs all pipelines)
curl -X POST http://localhost:5000/api/analyze \
-H "Content-Type: application/json" \
-d '{"text":"Your page content here","url":"https://example.com"}'
# Individual endpoints
POST /api/summarize # Text summarization
POST /api/classify # Content classification
POST /api/entities # Named entity extraction
POST /api/embed # Embedding generation
POST /api/keywords # Keyword extraction
Create backend/.env from the provided example:
PORT=3000
NODE_ENV=development
JWT_SECRET=your-secret-key
# Database
DB_HOST=localhost
DB_PORT=5432
DB_NAME=tab_bookmark_manager
DB_USER=postgres
DB_PASSWORD=postgres
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_URL=redis://localhost:6379
# Services
ML_SERVICE_URL=http://localhost:5000
ARCHIVE_DIR=./archives
LOG_LEVEL=info
Create ml-service/.env from the provided example:
PORT=5000
DEBUG=False
LOG_LEVEL=INFO
The backend includes Jest test suites covering authentication, bulk operations, and core API functionality.
cd backend
# Run all tests with coverage
npm test
# Run specific test suites
npm run test:auth # Authentication tests
npm run test:bulk # Bulk operation tests
# Lint
npm run lint
Tests use SQLite as an in-memory database substitute and ioredis-mock for Redis, ensuring fast execution without external service dependencies.
cd ml-service
source venv/bin/activate
pytest
# Update environment variables for production
# backend/.env: NODE_ENV=production, strong passwords, real JWT secret
# ml-service/.env: DEBUG=False
# Start all services
docker compose up -d
# Verify
docker compose ps
docker compose logs -f
The containerized architecture supports deployment to any Docker-compatible platform:
tab-bookmark-manager/
├── backend/ # Node.js/Express REST API
│ ├── src/
│ │ ├── config/ # Database, Redis, queue, Swagger config
│ │ ├── controllers/ # Route handlers (auth, tabs, bookmarks, search, etc.)
│ │ ├── middleware/ # Auth middleware, error handler
│ │ ├── routes/ # Express route definitions
│ │ ├── services/ # Business logic (archive, automation, suggestions)
│ │ ├── utils/ # Logger, error classes, ML client
│ │ ├── __tests__/ # Jest test suites
│ │ └── index.js # Application entry point
│ ├── Dockerfile
│ └── package.json
├── ml-service/ # Python/Flask NLP microservice
│ ├── src/
│ │ ├── services/ # Classification, embeddings, NLP pipelines
│ │ └── app.py # Flask application entry point
│ ├── Dockerfile
│ └── requirements.txt
├── extension/ # Chrome/Edge browser extension (MV3)
│ ├── background/ # Service worker for event capture
│ ├── content/ # Content script for page extraction
│ ├── popup/ # Extension popup UI (HTML/CSS/JS)
│ ├── icons/ # Extension icons
│ └── manifest.json
├── infrastructure/
│ └── docker/ # Production Docker Compose
├── scripts/ # Setup, migration, and dev scripts
├── docs/ # Architecture, API, ML, deployment docs
├── docker-compose.yml # Development Docker Compose
├── CONTRIBUTING.md
├── LICENSE # MIT
└── README.md
This repository is part of the ORGAN-III (Ergon) organization, which houses the commerce and product layer of the organvm ecosystem. Tab & Bookmark Manager sits at the intersection of developer productivity tooling and applied machine learning.
public-record-data-scrapper — Data collection and scraping infrastructure that shares patterns with this project’s content extraction pipeline.a-i-chat--exporter — AI conversation export tooling, another knowledge-management utility in the Ergon portfolio.The following enhancements are planned for future development:
Contributions are welcome. Please see CONTRIBUTING.md for guidelines on:
For substantial changes, open an issue first to discuss the approach.
This project is licensed under the MIT License. See LICENSE for the full text.
Built as part of the organvm ecosystem — an eight-organ creative-institutional system spanning theory, art, commerce, orchestration, public process, community, and distribution.