đ§± Core Concepts
Understanding Curiositiâs core concepts will help you make the most of the platform.
Workspaces
Section titled âWorkspacesâWorkspaces are the top-level containers in Curiositi. They represent teams or companies.
Key Features
Section titled âKey Featuresâ- Multi-tenancy â Each workspace is completely isolated
- Member Management â Invite users with different roles (owner, admin, member)
- Session Scoping â Users select an active workspace, and all queries are scoped to it
Workspace Structure
Section titled âWorkspace Structureâmindmap
root((Workspace))
Members
Owner
Admin
Members
Spaces
Marketing
Engineering
Finance
Files
document.pdf
report.csv
photo.png
Workspace Data Model
Section titled âWorkspace Data ModelâWorkspaces are stored as organization records in the database (via Better Auth):
| Field | Type | Description |
|---|---|---|
id | text | Primary key |
name | text | Workspace name |
slug | text | URL-friendly identifier (unique) |
logo | text | Optional logo URL |
metadata | text | Optional metadata |
createdAt | timestamp | Creation time |
Spaces are Curiositiâs way of organizing content. They work like folders with hierarchical nesting.
Space Hierarchy
Section titled âSpace HierarchyâSpaces can be nested using the parentSpaceId field:
mindmap
root((Marketing))
Campaigns
Q1 2025
Q2 2025
Brand Assets
Space Data Model
Section titled âSpace Data Modelâ| Field | Type | Description |
|---|---|---|
id | UUID | Primary key (auto-generated) |
name | text | Display name |
description | text | Optional description |
icon | text | Optional icon (e.g., emoji) |
organizationId | text | Owning workspace |
parentSpaceId | UUID / null | Parent space reference for nesting |
createdAt | timestamp | Creation time |
updatedAt | timestamp | Last modification time |
Spaces vs Traditional Folders
Section titled âSpaces vs Traditional Foldersâ| Feature | Traditional Folders | Curiositi Spaces |
|---|---|---|
| Nesting | Limited depth | Unlimited hierarchy |
| Search | Filename only | Semantic search across all content |
| File location | Files in one folder | Files can be in multiple spaces |
| Organization scope | Per-user | Per-workspace |
Files are the core content in Curiositi. Each file goes through a processing pipeline from upload to searchable content.
File Lifecycle
Section titled âFile LifecycleâstateDiagram-v2
[*] --> Upload
Upload --> Pending
Pending --> Processing
Processing --> Completed
Processing --> Failed
Completed --> [*]
Failed --> [*]
File Statuses
Section titled âFile Statusesâ| Status | Description |
|---|---|
pending | File uploaded, waiting for worker to process |
processing | Worker is extracting content and generating embeddings |
completed | File is fully processed and searchable |
failed | Processing encountered an error |
File Data Model
Section titled âFile Data Modelâ| Field | Type | Description |
|---|---|---|
id | UUID | Primary key (auto-generated) |
name | text | Original filename |
path | text | S3 storage path |
size | integer | File size in bytes |
type | text | MIME type |
organizationId | text | Owning workspace |
uploadedById | text | User who uploaded the file |
status | enum | pending, processing, completed, failed |
tags | jsonb | Optional tags (default: { tags: [] }) |
processedAt | timestamp | When processing completed |
createdAt | timestamp | Upload time |
updatedAt | timestamp | Last modification time |
File Processing Pipeline
Section titled âFile Processing PipelineâWhen a file is uploaded:
- Upload â File streams to S3 storage
- Metadata â File record created in database with status
pending - Queue â Processing job dispatched via Upstash QStash
- Content Extraction â Worker extracts text (documents) or generates descriptions (images)
- Chunking â Content split into chunks (800 tokens, 100 token overlap)
- Embedding â Each chunk converted to a 1536-dimension vector
- Storage â Chunks and embeddings saved to
fileContentstable - Complete â File status updated to
completed
Content Chunks
Section titled âContent ChunksâFiles are broken into chunks for precise semantic search.
Why Chunking?
Section titled âWhy Chunking?â- Precision â Find the exact relevant section, not just the file
- Context â Overlapping chunks preserve context across boundaries
- Token Limits â Fits within embedding model constraints
- Performance â Smaller vectors enable faster similarity search
Chunk Data Model (fileContents table)
Section titled âChunk Data Model (fileContents table)â| Field | Type | Description |
|---|---|---|
id | UUID | Primary key (auto-generated) |
fileId | UUID | Reference to parent file |
content | text | The text content of the chunk |
embeddedContent | vector(1536) | Vector embedding for similarity search |
metadata | json | Optional metadata about the chunk |
createdAt | timestamp | Creation time |
updatedAt | timestamp | Last modification time |
Chunking Parameters
Section titled âChunking Parametersâ- Chunk size: 800 tokens
- Overlap: 100 tokens
The Junction Pattern
Section titled âThe Junction PatternâFiles can exist in multiple spaces simultaneously using the filesInSpace junction table.
Many-to-Many Relationship
Section titled âMany-to-Many RelationshipâerDiagram
Files ||--o{ filesInSpace : "many-to-many"
Spaces ||--o{ filesInSpace : "many-to-many"
Files {
uuid id
string name
string path
}
Spaces {
uuid id
string name
uuid parentSpaceId
}
filesInSpace {
uuid id
uuid fileId
uuid spaceId
}
filesInSpace Data Model
Section titled âfilesInSpace Data Modelâ| Field | Type | Description |
|---|---|---|
id | UUID | Primary key |
fileId | UUID | Reference to file |
spaceId | UUID | Reference to space |
createdAt | timestamp | When the link was created |
updatedAt | timestamp | Last modification time |
Benefits
Section titled âBenefitsâ- No file duplication in storage
- Single source of truth for file content and embeddings
- Flexible organization â add a file to any number of spaces
- Easy reorganization without moving data
Semantic Search
Section titled âSemantic SearchâThe heart of Curiositi is semantic search â finding files by meaning, not just keywords.
How It Works
Section titled âHow It Worksâ- Query Embedding â Your search text is converted to a 1536-dimension vector
- Similarity Search â pgvector finds the closest matching content chunks using cosine similarity
- Ranking â Results ranked by similarity score
- Aggregation â Matching chunks grouped by source file
- Response â Files returned with relevance scores
Vector Embeddings
Section titled âVector EmbeddingsâCuriositi uses 1536-dimension embeddings:
mindmap
root((Embeddings))
"Quarterly sales report"
0.023
-0.156
0.892
...
"Q4 revenue summary"
0.019
-0.142
0.887
...
Authentication and Authorization
Section titled âAuthentication and AuthorizationâCuriositi uses Better Auth for authentication.
Supported Methods
Section titled âSupported Methodsâ- Email/Password â Standard credential-based login
- Google OAuth â Sign in with Google
Session Management
Section titled âSession ManagementâSessions are stored in PostgreSQL. Each session tracks the userâs active workspace (via activeOrganizationId), which scopes all subsequent queries.
Permission Model
Section titled âPermission Modelâ| Role | Capabilities |
|---|---|
| Owner | Full control, member management |
| Admin | Create spaces, upload files, manage content |
| Member | Upload files, search, read access |
Data Flow
Section titled âData FlowâHow data moves through Curiositi:
flowchart TB
User["User<br/>(Browser)"] -->|"Upload"| Platform["Platform<br/>(TanStack Start)"]
Platform -->|"Enqueue job"| Queue["Queue<br/>(QStash/bunqueue)"]
Platform -->|"Store metadata"| DB[(PostgreSQL<br/>+ pgvector)]
Queue -->|"Process"| Worker["Worker<br/>(Hono)"]
Worker -->|"Download/Upload"| S3["S3 Storage"]
Worker -->|"Store embeddings"| DB
User -->|"Query"| Platform
Platform -->|"Search"| DB
DB -->|"Results"| Platform
Next Steps
Section titled âNext Stepsâ- Uploading Files â Learn the file upload process
- AI Search â Master semantic search
- Spaces â Organize your content