🧱 Core Concepts

Understanding Curiositi’s core concepts will help you make the most of the platform.

Workspaces

Workspaces are the top-level containers in Curiositi. They represent teams or companies.

Key Features

Multi-tenancy — Each workspace is completely isolated
Member Management — Invite users with different roles (owner, admin, member)
Session Scoping — Users select an active workspace, and all queries are scoped to it

Workspace Structure

mindmap
  root((Workspace))
    Members
      Owner
      Admin
      Members
    Spaces
      Marketing
      Engineering
      Finance
    Files
      document.pdf
      report.csv
      photo.png

Workspace Data Model

Workspaces are stored as organization records in the database (via Better Auth):

Field	Type	Description
`id`	text	Primary key
`name`	text	Workspace name
`slug`	text	URL-friendly identifier (unique)
`logo`	text	Optional logo URL
`metadata`	text	Optional metadata
`createdAt`	timestamp	Creation time

Spaces

Spaces are Curiositi’s way of organizing content. They work like folders with hierarchical nesting.

Space Hierarchy

Spaces can be nested using the parentSpaceId field:

mindmap
  root((Marketing))
    Campaigns
      Q1 2025
      Q2 2025
    Brand Assets

Space Data Model

Field	Type	Description
`id`	UUID	Primary key (auto-generated)
`name`	text	Display name
`description`	text	Optional description
`icon`	text	Optional icon (e.g., emoji)
`organizationId`	text	Owning workspace
`parentSpaceId`	UUID / null	Parent space reference for nesting
`createdAt`	timestamp	Creation time
`updatedAt`	timestamp	Last modification time

Spaces vs Traditional Folders

Feature	Traditional Folders	Curiositi Spaces
Nesting	Limited depth	Unlimited hierarchy
Search	Filename only	Semantic search across all content
File location	Files in one folder	Files can be in multiple spaces
Organization scope	Per-user	Per-workspace

Files

Files are the core content in Curiositi. Each file goes through a processing pipeline from upload to searchable content.

File Lifecycle

stateDiagram-v2
    [*] --> Upload
    Upload --> Pending
    Pending --> Processing
    Processing --> Completed
    Processing --> Failed
    Completed --> [*]
    Failed --> [*]

File Statuses

Status	Description
`pending`	File uploaded, waiting for worker to process
`processing`	Worker is extracting content and generating embeddings
`completed`	File is fully processed and searchable
`failed`	Processing encountered an error

File Data Model

Field	Type	Description
`id`	UUID	Primary key (auto-generated)
`name`	text	Original filename
`path`	text	S3 storage path
`size`	integer	File size in bytes
`type`	text	MIME type
`organizationId`	text	Owning workspace
`uploadedById`	text	User who uploaded the file
`status`	enum	`pending`, `processing`, `completed`, `failed`
`tags`	jsonb	Optional tags (default: `{ tags: [] }`)
`processedAt`	timestamp	When processing completed
`createdAt`	timestamp	Upload time
`updatedAt`	timestamp	Last modification time

File Processing Pipeline

When a file is uploaded:

Upload — File streams to S3 storage
Metadata — File record created in database with status pending
Queue — Processing job dispatched via Upstash QStash
Content Extraction — Worker extracts text (documents) or generates descriptions (images)
Chunking — Content split into chunks (800 tokens, 100 token overlap)
Embedding — Each chunk converted to a 1536-dimension vector
Storage — Chunks and embeddings saved to fileContents table
Complete — File status updated to completed

Content Chunks

Files are broken into chunks for precise semantic search.

Why Chunking?

Precision — Find the exact relevant section, not just the file
Context — Overlapping chunks preserve context across boundaries
Token Limits — Fits within embedding model constraints
Performance — Smaller vectors enable faster similarity search

Chunk Data Model (fileContents table)

Field	Type	Description
`id`	UUID	Primary key (auto-generated)
`fileId`	UUID	Reference to parent file
`content`	text	The text content of the chunk
`embeddedContent`	vector(1536)	Vector embedding for similarity search
`metadata`	json	Optional metadata about the chunk
`createdAt`	timestamp	Creation time
`updatedAt`	timestamp	Last modification time

Chunking Parameters

Chunk size: 800 tokens
Overlap: 100 tokens

The Junction Pattern

Files can exist in multiple spaces simultaneously using the filesInSpace junction table.

Many-to-Many Relationship

erDiagram
    Files ||--o{ filesInSpace : "many-to-many"
    Spaces ||--o{ filesInSpace : "many-to-many"

    Files {
        uuid id
        string name
        string path
    }

    Spaces {
        uuid id
        string name
        uuid parentSpaceId
    }

    filesInSpace {
        uuid id
        uuid fileId
        uuid spaceId
    }

filesInSpace Data Model

Field	Type	Description
`id`	UUID	Primary key
`fileId`	UUID	Reference to file
`spaceId`	UUID	Reference to space
`createdAt`	timestamp	When the link was created
`updatedAt`	timestamp	Last modification time

Benefits

No file duplication in storage
Single source of truth for file content and embeddings
Flexible organization — add a file to any number of spaces
Easy reorganization without moving data

Semantic Search

The heart of Curiositi is semantic search — finding files by meaning, not just keywords.

How It Works

Query Embedding — Your search text is converted to a 1536-dimension vector
Similarity Search — pgvector finds the closest matching content chunks using cosine similarity
Ranking — Results ranked by similarity score
Aggregation — Matching chunks grouped by source file
Response — Files returned with relevance scores

Vector Embeddings

Curiositi uses 1536-dimension embeddings:

mindmap
  root((Embeddings))
    "Quarterly sales report"
      0.023
      -0.156
      0.892
      ...
    "Q4 revenue summary"
      0.019
      -0.142
      0.887
      ...

Authentication and Authorization

Curiositi uses Better Auth for authentication.

Supported Methods

Email/Password — Standard credential-based login
Google OAuth — Sign in with Google

Session Management

Sessions are stored in PostgreSQL. Each session tracks the user’s active workspace (via activeOrganizationId), which scopes all subsequent queries.

Permission Model

Role	Capabilities
Owner	Full control, member management
Admin	Create spaces, upload files, manage content
Member	Upload files, search, read access

Data Flow

How data moves through Curiositi:

flowchart TB
    User["User<br/>(Browser)"] -->|"Upload"| Platform["Platform<br/>(TanStack Start)"]
    Platform -->|"Enqueue job"| Queue["Queue<br/>(QStash/bunqueue)"]
    Platform -->|"Store metadata"| DB[(PostgreSQL<br/>+ pgvector)]
    Queue -->|"Process"| Worker["Worker<br/>(Hono)"]
    Worker -->|"Download/Upload"| S3["S3 Storage"]
    Worker -->|"Store embeddings"| DB
    User -->|"Query"| Platform
    Platform -->|"Search"| DB
    DB -->|"Results"| Platform

Next Steps

Uploading Files — Learn the file upload process
AI Search — Master semantic search
Spaces — Organize your content