Skip to content

đŸ§± Core Concepts

Understanding Curiositi’s core concepts will help you make the most of the platform.

Workspaces are the top-level containers in Curiositi. They represent teams or companies.

  • Multi-tenancy — Each workspace is completely isolated
  • Member Management — Invite users with different roles (owner, admin, member)
  • Session Scoping — Users select an active workspace, and all queries are scoped to it
mindmap
  root((Workspace))
    Members
      Owner
      Admin
      Members
    Spaces
      Marketing
      Engineering
      Finance
    Files
      document.pdf
      report.csv
      photo.png

Workspaces are stored as organization records in the database (via Better Auth):

FieldTypeDescription
idtextPrimary key
nametextWorkspace name
slugtextURL-friendly identifier (unique)
logotextOptional logo URL
metadatatextOptional metadata
createdAttimestampCreation time

Workspaces support role-based access through the organizationRoles table:

FieldTypeDescription
idtextPrimary key
organizationIdtextReference to workspace
roletextRole name
permissiontextPermission granted to this role
createdAttimestampCreation time
updatedAttimestampLast modification time

Users can be invited to workspaces via the invitation table:

FieldTypeDescription
idtextPrimary key
emailtextInvitee email address
inviterIdtextReference to inviting user
organizationIdtextReference to workspace
roletextRole to assign
statustextInvitation status
createdAttimestampCreation time
expiresAttimestampExpiration time

Spaces are Curiositi’s way of organizing content. They work like folders with hierarchical nesting.

Spaces can be nested using the parentSpaceId field:

mindmap
  root((Marketing))
    Campaigns
      Q1 2025
      Q2 2025
    Brand Assets
FieldTypeDescription
idUUIDPrimary key (auto-generated)
nametextDisplay name
descriptiontextOptional description
icontextOptional icon (e.g., emoji)
organizationIdtextOwning workspace
parentSpaceIdUUID / nullParent space reference for nesting
createdAttimestampCreation time
updatedAttimestampLast modification time
FeatureTraditional FoldersCuriositi Spaces
NestingLimited depthUnlimited hierarchy
SearchFilename onlySemantic search across all content
File locationFiles in one folderFiles can be in multiple spaces
Organization scopePer-userPer-workspace

Files are the core content in Curiositi. Each file goes through a processing pipeline from upload to searchable content.

stateDiagram-v2
    [*] --> Upload
    Upload --> Pending
    Pending --> Processing
    Processing --> Completed
    Processing --> Failed
    Completed --> [*]
    Failed --> [*]
StatusDescription
pendingFile uploaded, waiting for worker to process
processingWorker is extracting content and generating embeddings
completedFile is fully processed and searchable
failedProcessing encountered an error
FieldTypeDescription
idUUIDPrimary key (auto-generated)
nametextOriginal filename
pathtextS3 storage path
sizeintegerFile size in bytes
typetextMIME type
organizationIdtextOwning workspace
uploadedByIdtextUser who uploaded the file
statusenumpending, processing, completed, failed
tagsjsonbOptional tags (default: { tags: [] })
processedAttimestampWhen processing completed
createdAttimestampUpload time
updatedAttimestampLast modification time

When a file is uploaded:

  1. Upload — File streams to S3 storage
  2. Metadata — File record created in database with status pending
  3. Queue — Processing job dispatched via Upstash QStash or bunqueue
  4. Content Extraction — Worker extracts text based on file type:
    • PDF, text, markdown, HTML, CSV, JSON, XML: Direct text extraction
    • Word (.docx): mammoth library extraction with AI fallback
    • Word (.doc): AI-powered extraction
    • Excel (.xlsx): Sheet-aware extraction with header-aware chunking
    • Excel (.xls): AI-powered extraction
    • PowerPoint (.pptx): Slide text extraction with AI fallback
    • PowerPoint (.ppt): AI-powered extraction
    • Images: AI vision model generates description
  5. Chunking — Content split into chunks (300 tokens, 60 token overlap) with context prefix (file name, type, page numbers, section titles)
  6. Embedding — Each chunk converted to a 1536-dimension vector
  7. Storage — Chunks and embeddings saved to fileContents table
  8. Complete — File status updated to completed

Files are broken into chunks for precise semantic search.

  • Precision — Find the exact relevant section, not just the file
  • Context — Overlapping chunks preserve context across boundaries
  • Token Limits — Fits within embedding model constraints
  • Performance — Smaller vectors enable faster similarity search
FieldTypeDescription
idUUIDPrimary key (auto-generated)
fileIdUUIDReference to parent file
contenttextThe text content of the chunk
embeddedContentvector(1536)Vector embedding for similarity search
metadatajsonOptional metadata about the chunk
createdAttimestampCreation time
updatedAttimestampLast modification time
  • Chunk size: 300 tokens
  • Overlap: 60 tokens

Each chunk also includes a context prefix with metadata (file name, file type, page numbers, section titles, CSV headers) prepended to the content before embedding. This improves search relevance by providing contextual signals alongside the raw text.

Files can exist in multiple spaces simultaneously using the filesInSpace junction table.

erDiagram
    Files ||--o{ filesInSpace : "many-to-many"
    Spaces ||--o{ filesInSpace : "many-to-many"

    Files {
        uuid id
        string name
        string path
    }

    Spaces {
        uuid id
        string name
        uuid parentSpaceId
    }

    filesInSpace {
        uuid id
        uuid fileId
        uuid spaceId
    }
FieldTypeDescription
idUUIDPrimary key
fileIdUUIDReference to file
spaceIdUUIDReference to space
createdAttimestampWhen the link was created
updatedAttimestampLast modification time
  • No file duplication in storage
  • Single source of truth for file content and embeddings
  • Flexible organization — add a file to any number of spaces
  • Easy reorganization without moving data

Curiositi supports intelligent agentic workflows, enabling you to converse with your data and perform actions via chat.

Agents are AI entities powered by LLMs (e.g., OpenAI, Anthropic, Google, Ollama) configured with specific system prompts and tool access limits. They exist within a workspace and can use various tools to fulfill requests.

FieldTypeDescription
idUUIDPrimary key (auto-generated)
nametextAgent display name
descriptiontextOptional description
organizationIdtextOwning workspace
createdByIdtextUser who created the agent
systemPrompttextSystem prompt for the agent
maxToolCallsintegerMaximum tool calls per conversation turn (default: 10)
isDefaultbooleanWhether this is the default agent
isActivebooleanWhether the agent is active
createdAttimestampCreation time
updatedAttimestampLast modification time

Curiositi provides two built-in system agents:

AgentIDDescriptionMax Tool Calls
Asksystem:askGeneral-purpose assistant for everyday questions10
Deep Researchsystem:deep-researchThorough research agent that explores topics in depth100

Tools expand an agent’s capabilities:

  • Built-in Tools: Foundational actions available to agents:
    • File Search (fileSearch): Semantic search across uploaded documents and files
    • Web Search (webSearch): Search the web using Firecrawl for current information
    • Web Fetch (webFetch): Fetch and extract content from a specific URL
  • Model Context Protocol (MCP): Curiositi integrates with MCP servers to bring external capabilities, data, and context directly to your agents without custom integrations.
FieldTypeDescription
idUUIDPrimary key
toolKeytextUnique key identifier (e.g., fileSearch, webSearch)
nametextInternal name
displayNametextHuman-readable name
descriptiontextTool description
typeenumbuiltin or mcp
mcpServerIdUUID / nullReference to MCP server (for MCP tools)
organizationIdtextOwning workspace
configjsonbTool configuration (default: {})
isActivebooleanWhether the tool is active
createdAttimestampCreation time
updatedAttimestampLast modification time
FieldTypeDescription
idUUIDPrimary key
agentIdUUIDReference to agent
toolIdUUIDReference to tool
enabledbooleanWhether the tool is enabled for this agent
priorityintegerTool priority (default: 0)
configjsonbAgent-specific tool configuration
createdAttimestampCreation time

Conversations capture the interactions (messages and tool call context) between users and an agent, providing a persistent history of queries and analysis.

FieldTypeDescription
idUUIDPrimary key
externalIdtext / nullExternal identifier (unique)
titletext / nullConversation title
sourceenumweb or slack
organizationIdtextOwning workspace
createdByIdtextUser who started the conversation
metadatajsonb / nullOptional metadata
createdAttimestampCreation time
updatedAttimestampLast modification time
FieldTypeDescription
idUUIDPrimary key
conversationIdUUIDReference to conversation
roleenumuser, assistant, system, or tool
contenttextMessage content
attachmentsjsonb / nullFile attachments
toolCallsjsonb / nullTool call data
tokenCountinteger / nullToken usage count
costUSDnumeric / nullCost in USD
agentIdUUID / nullReference to agent (set null on agent deletion)
metadatajsonb / nullOptional metadata
createdAttimestampCreation time

MCP servers provide external tools and context to agents:

FieldTypeDescription
idUUIDPrimary key
nametextServer display name
urltextMCP server endpoint URL
headersjsonb / nullCustom headers for authentication
headersEncryptedtext / nullEncrypted headers
isActivebooleanWhether the server is active
organizationIdtextOwning workspace
discoveredToolsintegerNumber of tools discovered (default: 0)
lastConnectedAttimestamp / nullLast successful connection time
createdAttimestampCreation time
updatedAttimestampLast modification time

Workspace-level settings are stored in the organizationSettings table:

FieldTypeDescription
idUUIDPrimary key
organizationIdtextReference to workspace
keytextSetting key
valuejsonbSetting value
updatedAttimestampLast modification time

The heart of Curiositi is semantic search — finding files by meaning, not just keywords.

  1. Query Embedding — Your search text is converted to a 1536-dimension vector
  2. Similarity Search — pgvector finds the closest matching content chunks using cosine similarity
  3. Ranking — Results ranked by similarity score
  4. Aggregation — Matching chunks grouped by source file
  5. Response — Files returned with relevance scores

Curiositi uses 1536-dimension embeddings:

mindmap
  root((Embeddings))
    "Quarterly sales report"
      0.023
      -0.156
      0.892
      ...
    "Q4 revenue summary"
      0.019
      -0.142
      0.887
      ...

Curiositi uses Better Auth for authentication.

  • Email/Password — Standard credential-based login
  • Google OAuth — Sign in with Google

Sessions are stored in PostgreSQL. Each session tracks the user’s active workspace (via activeOrganizationId), which scopes all subsequent queries.

RoleCapabilities
OwnerFull control, member management
AdminCreate spaces, upload files, manage content
MemberUpload files, search, read access

How data moves through Curiositi:

flowchart TB
    User["User<br/>(Browser)"] -->|"Upload"| Platform["Platform<br/>(TanStack Start)"]
    Platform -->|"Enqueue job"| Queue["Queue<br/>(QStash/bunqueue)"]
    Platform -->|"Store metadata"| DB[(PostgreSQL<br/>+ pgvector)]
    Queue -->|"Process"| Worker["Worker<br/>(Hono)"]
    Worker -->|"Download/Upload"| S3["S3 Storage"]
    Worker -->|"Store embeddings"| DB
    User -->|"Query"| Platform
    Platform -->|"Search"| DB
    DB -->|"Results"| Platform