Tax Practice AI - Backlog¶
Last updated: 2025-12-28 (v0.14 - S13 Complete, 72% progress)
This document tracks priority items, technical debt, and pending decisions.
V1 Back-Office AI Companion (Tax Season 2025)¶
Status: Requirements Complete, Implementation In Progress Target: January 2025 deployment for Tax Season 2025
V1 deploys as a back-office companion tool - staff use AI analysis alongside existing workflows without disrupting clients.
V1 Scope¶
| Feature | Status | Description |
|---|---|---|
| Quick Client Entry | Defined | Minimal form: name, tax year, legacy account link |
| Drag-Drop Upload | Defined | Upload documents directly into viewer |
| Folder Import | Defined | Import entire folder of documents |
| AI Classification | Defined | Automatic document type identification |
| AI Analysis | Defined | Prior year comparison, anomaly detection, missing docs |
| Q&A Assistant | Defined | Ask questions during review with source citations |
| Annotations | Defined | Notes, flags, questions on documents |
| Worksheet Export | Defined | PDF/Excel with full source citations |
| S3 Fallback | Defined | AI works when cloud storage unavailable |
V1 Documentation¶
| Document | Purpose |
|---|---|
| V1_COMPANION_REQUIREMENTS.md | Full requirements |
| V1_USE_CASES.md | Detailed use cases |
| V1_UI_CHANGES.md | UI specifications |
| ARCHITECTURE.md Section 16 | Deployment model |
V1 Testing¶
- BDD feature files created (Gherkin syntax)
- 16 BDD scenarios passing (quick client, document upload)
- Test data generator available:
python scripts/generate_test_data.py - Sample documents generated: W-2s, 1099s, bank statements, receipts, CSVs
V1 Philosophy¶
Augment, don't replace. Zero client disruption.
- Clients continue using SmartVault for uploads
- UltraTax remains the tax prep tool
- Legacy system remains source of truth
- New account numbers prefixed with 'A' (pending client confirmation)
Client Decisions (Resolved)¶
TAX-001: Tax Software Selection¶
Status: ✅ Resolved (2024-12-23) Decision: UltraTax CS (Thomson Reuters) Integration: Via SurePrep CS Connect bridge (UltraTax has no direct API)
TAX-002: Volume Projections¶
Status: ✅ Resolved (2024-12-23) Decision: ~1,000 returns/year with 30% annual growth expectation
| Timeframe | Returns per Year |
|---|---|
| Year 1 | 1,000 |
| Year 2 | 1,300 |
| Year 3 | 1,700 |
| Year 5 | 2,850 |
Note: Growth may accelerate due to system efficiency gains.
TAX-003: Entity Type Mix¶
Status: ✅ Resolved (2024-12-23) Decision: Equal priority for individuals and businesses. Business clients are primarily small businesses (advanced individuals). No differentiation needed in V1.
TAX-004: State Coverage¶
Status: ✅ Resolved (2024-12-23) Decision: Florida + surrounding states (GA, AL, SC, NC) initially. Design for all 50 states from the start - full coverage expected soon.
Phase 0: Data Migration (Pre-Launch Prerequisite)¶
Data migration must be completed before go-live. Depends on Sequence 1 infrastructure.
MIG-001: Client Data Import Tool¶
Status: ✅ COMPLETE Priority: P0 (Pre-Launch Blocker)
- CLI tool:
tax-migrate clients <file> - CSV/Excel parsing with column mapping
- Duplicate detection and handling
- Account number generation for imports
- Dry-run mode for validation
- Import summary report
- Audit logging
Files:
- scripts/tax_migrate.py - CLI entry point (per MIG-080)
- src/migration/__init__.py - Module exports
- src/migration/client_importer.py - Main import logic (MIG-001 through MIG-008)
- src/migration/column_mapper.py - Flexible column mapping (MIG-002)
- src/migration/duplicate_detector.py - Duplicate detection (MIG-010 through MIG-013)
- src/migration/import_report.py - Report generation (MIG-021, MIG-110 through MIG-114)
- tests/unit/test_migration_column_mapper.py - Unit tests (16 tests)
Requirements: See MIGRATION_REQUIREMENTS.md Section 2
MIG-002: Bulk Document Import Tool¶
Status: ✅ COMPLETE Priority: P0 (Pre-Launch Blocker)
- CLI tool:
tax-migrate documents <folder>(with --preview mode) - Folder structure pattern matching (client-name-first, account-first, year-first)
- Document classification by filename (W-2, 1099, 1098, K-1, identity docs, etc.)
- Client matching with fuzzy name support (configurable threshold)
- Malware scanning integration (placeholder for ClamAV)
- Unmatched document quarantine (--quarantine-dir option)
- Import report with match statistics (classification stats, match confidence)
Files:
- src/migration/document_classifier.py - Document type classification (MIG-040, MIG-042, MIG-043)
- src/migration/client_matcher.py - Client matching with fuzzy support (MIG-050 through MIG-054)
- src/migration/document_importer.py - Main import logic (MIG-030 through MIG-036)
- scripts/tax_migrate.py - CLI entry point (documents subcommand)
- tests/unit/test_migration_document_classifier.py - Unit tests (22 tests)
Requirements: See MIGRATION_REQUIREMENTS.md Section 3
MIG-003: Historical Return Data Import¶
Status: ✅ COMPLETE Priority: P0 (Pre-Launch Blocker)
- CLI tool:
tax-migrate history <file>(with --preview, --dry-run modes) - UltraTax export format support (per MIG-064)
- Generic CSV format support (per MIG-065)
- Prior year AGI import and client update (per MIG-060)
- Filing status normalization (per MIG-061)
- Refund/balance due import (per MIG-062)
- Return history record creation (per MIG-070)
- Client matching by external_id and SSN-4
Files:
- src/migration/history_importer.py - History import logic (MIG-060 through MIG-070)
- scripts/tax_migrate.py - CLI entry point (history subcommand)
- tests/unit/test_migration_history_importer.py - Unit tests (27 tests)
Requirements: See MIGRATION_REQUIREMENTS.md Section 4
MIG-004: Migration Validation & Rollback¶
Status: ✅ COMPLETE Priority: P1
- Migration summary report generation (per MIG-110)
- Source vs imported count comparison (per MIG-111)
- Error and warning listing (per MIG-112)
- Sample client report for spot-checking (per MIG-113)
- Rollback preview (per MIG-140)
- Rollback CLI:
tax-migrate rollback <batch-id>(per MIG-142) - List recent migration batches:
tax-migrate rollback --list - Validate batch:
tax-migrate rollback <batch-id> --validate - Generate report:
tax-migrate rollback <batch-id> --report
Files:
- src/migration/migration_validator.py - Validation and rollback logic
- scripts/tax_migrate.py - CLI entry point (rollback subcommand)
- tests/unit/test_migration_validator.py - Unit tests (16 tests)
Requirements: See MIGRATION_REQUIREMENTS.md Sections 6, 8
Phase 1: Foundation (Complete)¶
FOUND-001: Project Structure Setup¶
Status: Complete Priority: P0
- Create ARCHITECTURE.md
- Create CLAUDE.md
- Create RUNBOOK.md
- Create backlog.md
- Create src/ directory structure
- Create config.yaml with commented parameters
- Set up requirements.txt with core dependencies
- Create .env.example template
FOUND-002: Service Centralization Framework¶
Status: Complete Priority: P0
- Create src/services/base_service.py
- Create src/services/init.py (ServiceRegistry)
- Create src/config/settings.py
- Create src/config/secrets.py (AWS Secrets Manager - future)
FOUND-003: Snowflake Service Implementation¶
Status: Not Started Priority: P1
- Create src/services/snowflake_service.py
- Implement connection pooling
- Add query logging for audit
- Add health check method
FOUND-004: Aurora Service Implementation¶
Status: Complete Priority: P1
- Create src/services/aurora_service.py
- Implement connection pooling
- Add transaction support
- Add health check method
- Add database error mapping (ConflictError, ValidationError, etc.)
Phase 2-7: Client, Document & Tax Preparation (Complete)¶
Sequence 2: Client Identity (Complete)¶
- S2-001: Client Self-Registration (Done)
- S2-002: Identity Verification with Persona (Done)
- S2-003: Returning Client Authentication (Done)
- S2-004: Profile Management (Done)
Sequence 3: Engagement (Complete)¶
- S3-001: Engagement Letter Generation (Done)
- S3-002: E-Signature via Google Docs (Done)
- S3-003: Form 7216 Consent Management (Done)
Sequence 4: Document Management (Complete)¶
- S4-001: Document Upload via Portal (Done)
- S4-002: Document Upload via Email (Done)
- S4-003: Malware Scanning (Done)
- S4-004: Document Classification and Extraction (Done)
- S4-005: SmartVault Integration (Done)
- S4-006: SurePrep Integration (Done)
- S4-007: Document Checklist Management (Done)
- S4-008: Manual Extraction Correction (Done)
Sequence 5: AI Analysis (Complete)¶
- S5-001: Preliminary Return Analysis (Done)
- S5-002: Prior Year Comparison (Done)
- S5-003: Missing Document Detection (Done)
- S5-004: AI-Powered Q&A (Done)
- S5-005: Extraction Corrections (Done)
- S5-006: Analysis Dashboard (Done)
Sequence 6: Tax Preparation Workflow (Complete)¶
- S6-001: Workflow State Machine (Done)
- S6-002: Preparer Assignment (Done)
- S6-003: Reviewer Assignment (Done)
- S6-004: Progress Tracking (Done)
- S6-005: Dashboard Views (Done)
- S6-006: Time Tracking (Done)
- S6-007: Priority Management (Done)
- S6-008: Batch Operations (Done)
- S6-009: Analytics (Done)
Sequence 7: Preparer & Reviewer Interface (Complete)¶
- S7-001: Interactive Review Interface (Done)
- S7-002: AI Q&A Assistant (Done)
- S7-003: Change Tracking (Done)
- S7-004: Final Review Package (Done)
Sequence 8: Client Communication (Complete)¶
- S8-001: Secure Portal Messaging (Done)
- S8-002: Email Notifications (Done)
- S8-003: SMS Notifications (Done)
- S8-004: Callback Scheduling (Done)
- S8-005: Notification Preferences (Done)
Sequence 9: Client Delivery (Complete)¶
- S9-001: Tax Package Generation (Done)
- S9-002: Google Workspace Signature Integration (Done)
- S9-003: Payment Authorization (Done)
Sequence 10: E-Filing Status Tracking (Complete)¶
- S10-001: E-File Status Monitoring (Done)
- S10-002: Mark Return Ready for Filing (Done)
- S10-003: Filing Ready Check (Done)
- S10-004: Rejection Management (Done)
Sequence 11: Billing & Payments (Complete)¶
- S11-001: Stripe Service Implementation (Done)
- S11-002: Invoice Generation (Done)
- S11-003: Payment Collection (Done)
- S11-004: Payment Reminders (Done)
Sequence 12: Estimated Tax Management (Complete - 3 of 4 stories)¶
- S12-001: Estimated Tax Calculation (Done)
- S12-002: Voucher Generation (Done)
- S12-003: Calendar Event Generation (Done)
- S12-004: Estimated Tax Reminders (DEFERRED - see note below)
S12-004 Deferral Note: By sending estimated tax reminders, the firm implies responsibility for notifying clients. When emails or SMS are missed (spam filters, wrong number, etc.), clients blame the firm for their missed payments and penalties. This inappropriately shifts liability. Clients should use calendar events (S12-003) instead.
Implementation Files:
- Domain: src/domain/estimated_tax.py
- Repository: src/repositories/estimated_tax_repository.py
- Workflows: src/workflows/estimated_tax/ (calculation, voucher, calendar)
- API: src/api/routes/estimated_tax.py
- Schemas: src/api/schemas/estimated_tax_schemas.py
Sequence 13: AI Chat (Complete)¶
- S13-001: Chat Domain & Repository (Done)
- S13-002: Chat Service with CLI/API Modes (Done)
- S13-003: Chat API Routes (Done)
- S13-004: Staff-App Chat UI (Done)
- S13-005: Chat Integration Tests (Pending)
Features: - Tax-focused system prompt with off-topic redirection - Single-client scope enforcement (no cross-client queries) - CLI mode for local dev (uses developer's Claude subscription) - API mode for production (AWS Bedrock) - Extended context building (client, return, documents, prior year) - Floating drawer UI in staff-app ClientDetailPage - Token/cost tracking per session
Implementation Files:
- Domain: src/domain/chat.py
- Repository: src/repositories/chat_repository.py
- Service: src/services/chat_service.py
- API: src/api/routes/chat.py
- Schemas: src/api/schemas/chat_schemas.py
- Frontend: frontend/apps/staff-app/src/components/chat/
- Hook: frontend/apps/staff-app/src/hooks/useChat.ts
Future (Backlog): - S13-006: Client Portal Chat - S13-007: WebSocket Streaming - S13-008: Cross-Client Queries (after security/cost analysis) - S13-009: Opus Escalation ("Ask the Expert" button)
Technical Debt¶
TD-001: Java Build Configuration¶
Status: Not Started Description: Need to establish Maven/Gradle configuration for Java components Notes: Should mirror ingestion engine patterns for consistency
TD-002: CI/CD Pipeline¶
Status: Complete
Description: Set up GitHub Actions for automated testing and deployment
Implementation: .github/workflows/ci.yml
Pipeline jobs: - [x] lint - Ruff linting and formatting checks - [x] unit-tests - Fast tests with coverage reporting (~30s) - [x] integration-tests - PostgreSQL service container (~2m) - [x] e2e-tests - PostgreSQL + LocalStack service containers (~2m)
Triggers: Push to main, pull requests to main
TD-003: Testing Framework¶
Status: Complete Description: Establish pytest structure for Python, JUnit for Java Progress: Full test pyramid implemented with 1,522 tests passing
| Test Type | Count | Percent | Target |
|---|---|---|---|
| Unit | 1,290 | 85% | 80% |
| Integration | 182 | 12% | 15% |
| E2E | 50 | 3% | 5% |
Coverage: - Unit: Exceptions, middleware, domain entities, services, workflows (S2-S11) - Integration: Repositories (Client, Document, Engagement, Consent, Extraction, Checklist, Workflow, Review, Messaging, Delivery, EFiling, Invoice), S3Service with LocalStack - E2E: Client, Verification, Engagement, Consent, Document, Messaging, Delivery, EFiling, Invoice API endpoints
TD-004: UAT Script Creation¶
Status: Not Started Priority: P1 (Pre-Launch) Description: Create User Acceptance Testing scripts for client validation
Traceability: - [ ] Create requirements traceability matrix (RTM) linking UAT scripts to USER_STORIES.md - [ ] Each UAT test case references specific story ID (e.g., S2-001, S10-003) - [ ] Track coverage percentage against requirements - [ ] Bug tracking with requirement linkage for root cause analysis
UAT Scripts: - [ ] Define UAT scenarios for each sequence (S2-S18) - [ ] Create step-by-step testing scripts with requirement references - [ ] Define expected outcomes and acceptance criteria per story - [ ] Create UAT reporting templates with pass/fail per requirement - [ ] Document rollback procedures for failed UAT
TD-005: Test Data Generator (TDG-001)¶
Status: Complete (2024-12-26) Priority: P1 (Pre-Launch) Description: Comprehensive test data generator with realistic scenarios and document quality variations
User Personas (each with complete document sets): - [x] Individual (Simple): Single W-2, standard deductions, single state - [x] Individual (Heavy Investor): Multiple 1099-DIVs, 1099-Bs, K-1s - [x] Business Owner: Schedule C, 1099-NECs, business expenses, mileage - [x] Sub-Contractor: Multiple 1099-NECs, 1099-Ks, mileage logs - [x] Complex Individual: Multi-state, K-1s, 1098 mortgage - [x] S-Corp Owner: K-1 from S-Corp, W-2 from own company, distributions - [x] Retiree: 1099-R, SSA-1099, investment income
Document Types Generated: - [x] W-2s (single/multiple employers) - [x] 1099 series (DIV, INT, B, NEC, MISC, R, SSA, K) - [x] 1098 (mortgage interest) - [x] K-1s (partnership 1065, S-Corp 1120S) - [x] PDF bank statements - [x] PDF credit card statements - [x] Receipt images (PNG with quality variations) - [x] Mileage logs - [ ] Prior year tax returns (future enhancement) - [ ] ID documents (future enhancement)
Image Quality Variations: - [x] Excellent quality (clean generation) - [x] Medium quality (slight blur) - [x] Poor quality (blur, rotation) - [x] Terrible quality (heavy blur, rotation, JPEG artifacts)
Multi-Batch Scenarios: - [x] Batch assignment based on persona configuration - [x] Delayed documents (K-1s, corrected 1099s) arrive in later batches - [x] 1-4 batch scenarios per persona
Generator Implementation:
- [x] CLI tool: python scripts/generate_test_data.py --persona business_owner
- [x] Configurable output directory (--output)
- [x] Reproducible with seed (--seed 42)
- [x] Generate realistic but fake PII (987-65-xxxx SSN range)
- [x] 69 unit tests passing
Files:
- scripts/generate_test_data.py - CLI entry point
- src/testing/data_generator.py - Core generation logic
- src/testing/document_renderer.py - PDF/image rendering
- src/testing/utils.py - PII generation utilities
- src/testing/personas/personas.yaml - Persona configurations
- tests/unit/testing/ - Unit tests
TD-006: Placeholder and Assumption Audit¶
Status: In Progress Priority: P1 (Pre-Launch) Description: Review and address "for now", "placeholder", and "TODO" items throughout codebase Plan: See docs/plans/TECH_DEBT_CLEANUP.md
Audit Complete (2024-12-24): Found 89 items, categorized as: - 35 Production Blockers: External service stubs (Email, SMS, Persona, SmartVault, SurePrep, Google) - 12 Development Conveniences: Acceptable placeholders (PDF generation, placeholder citations) - 42 Not Issues: Documentation/expected behavior (SQL placeholders, template variables)
Non-API Fixes Complete (2024-12-24): 4 items fixed without external API credentials: - [x] Consent route client lookup (fetches real client data from ClientRepository) - [x] AI QA citation resolution (resolves document names to actual IDs via DocumentRepository) - [x] EFiling ready checks (checks Form 8879 signatures and document checklist) - [x] Placeholder PDF generation (ReportLab implementation) - [x] Fixed ChecklistRepository abstract method implementation
Production blockers organized into 6 phases: - [x] Phase 0: Audit complete - [x] Phase 0b: Non-API fixes complete (4 items) - [ ] Phase 1: EmailService + SMSService (5-7 hrs) - requires API credentials - [ ] Phase 2: PersonaService (4-5 hrs) - requires API credentials - [ ] Phase 3: SmartVaultService (6-8 hrs) - requires API credentials - [ ] Phase 4: SurePrepService (8-10 hrs) - requires API credentials - [ ] Phase 5: GoogleService (6-8 hrs) - requires API credentials - [ ] Phase 6: Webhook Security (2-3 hrs)
Lesson learned: "Simpler" shortcuts that mask real requirements create hidden bugs. All assumptions should be explicitly documented and validated.
TD-007: Code Audit Findings (AUDIT-001 through AUDIT-008)¶
Status: Documented Priority: P2 (Post-MVP) Audit Date: 2024-12-27 Report: docs/audits/CODE_AUDIT_2024-12-27.md
Overall Rating: 8.5/10 - Production-ready architecture in src/, development-only duplication in local_api.py
Ratings by Area: - Architecture: 9/10 (excellent layering, services → repositories → domain) - Service Centralization: 9/10 (25 services inherit BaseService) - API Client Centralization: 10/10 (single api.ts for all calls) - Connection Pooling: 6/10 (src/ good, local_api.py needs work) - Maintainability: 8/10 (some large files need splitting) - Scalability: 8/10 (async patterns ready, needs pagination for large batches)
Action Items: - [x] AUDIT-001: Document local_api.py as dev-only in ARCHITECTURE.md (15 min) ✅ 2024-12-27 - [ ] AUDIT-002: Add connection pooling to local_api.py if used for extended demos (2 hrs) - [x] AUDIT-003: ~~Split repository files exceeding 1,000 lines~~ - Cancelled (single developer, no merge conflict risk) - [ ] AUDIT-004: Add inline comments to complex workflow logic (2 hrs) - Comments improve Claude's accuracy and speed - [ ] AUDIT-005: Move hardcoded values to environment variables (1 hr) - [ ] AUDIT-006: Consider migrating local_api.py to use src/ modules (8 hrs) - [ ] AUDIT-007: Add request/response logging middleware (2 hrs) - [ ] AUDIT-008: Implement caching layer for read-heavy endpoints (4 hrs)
TD-008: Security Audit Findings (SEC-001 through SEC-016)¶
Status: Documented Priority: P0 (Critical), P1 (High/Medium), P2 (Low) Audit Date: 2024-12-27 Report: docs/audits/SECURITY_AUDIT_2024-12-27.md
Overall Security Rating: 7/10 - Solid foundation, needs hardening before production
Critical (P0 - Fix Before Production): - [ ] SEC-001: XSS via template injection - src/services/template_service.py:276-301 - [ ] SEC-002: dangerouslySetInnerHTML without sanitization - TourOverlay.tsx:187-217 - [ ] SEC-003: Missing security headers - src/api/main.py - [x] SEC-004: Dev API weak auth - Already documented as dev-only per AUDIT-001
High (P0 - Fix Before Production): - [ ] SEC-005: ORDER BY SQL injection risk - base_repository.py:131 - [ ] SEC-006: CORS overly permissive - src/api/main.py:91-98 - [ ] SEC-007: Missing role enforcement on list endpoints - clients.py:66 - [ ] SEC-008: Stripe test keys in plain text - .env:44-45
Medium (P1 - Fix in V1.1): - [ ] SEC-009: No token revocation mechanism - auth.py - [ ] SEC-010: Payment bypass flag needs compliance doc - efiling.py:31-35 - [ ] SEC-011: Search parameter unbounded - clients.py:70 - [ ] SEC-012: Filename not sanitized in S3 path - local_api.py:879 - [ ] SEC-013: Request size limits not enforced - main.py, nginx.conf
Low (P2 - Address When Convenient): - [ ] SEC-014: In-memory rate limiting (needs Redis for prod) - rate_limiting.py:68 - [ ] SEC-015: Error messages may leak info - local_api.py:1513 - [ ] SEC-016: Email query parameter not validated - registration.py:252
TD-009: Compliance Audit Findings (COMP-001 through COMP-022)¶
Status: Documented Priority: P0 (Critical), P1 (High), P2 (Medium) Audit Date: 2024-12-27 Report: docs/audits/COMPLIANCE_AUDIT_2024-12-27.md
Overall Compliance Rating: 70/100 - Strong foundation with critical implementation gaps
Critical (P0 - Fix Before Production): - [ ] COMP-001: AI/cloud processing consent not implemented - Form 7216 violation risk - [ ] COMP-002: Form 7216 consent not enforced at e-filing - efiling_workflow.py - [ ] COMP-003: Authentication events not logged - audit_service.log_auth() never called - [ ] COMP-004: Document/client access not logged - only modifications tracked - [ ] COMP-005: Account lockout not implemented - SEC-005 design only - [ ] COMP-006: Field-level encryption not implemented - ENC-004 design only - [ ] COMP-007: Conflict of interest checks missing - Circular 230 requirement - [ ] COMP-008: Form 2848 (POA) workflow missing - domain model only
High (P1 - Fix in V1.1): - [ ] COMP-009: PTIN expiration not enforced in workflows - [ ] COMP-010: Competency/credential tracking missing - CIR-004 - [ ] COMP-011: MFA not implemented - framework only - [ ] COMP-012: Employee training tracking missing - WISP requirement - [ ] COMP-013: Database immutability not enforced - REVOKE statements commented - [ ] COMP-014: Incident detection/alerting not implemented - [ ] COMP-015: Persona integration in dry-run mode - [ ] COMP-016: Authorization denials not logged to audit
Medium (P2 - Address When Convenient): - [ ] COMP-017: Legal hold not implemented - RET-004 - [ ] COMP-018: Secure deletion not implemented - RET-005 - [ ] COMP-019: Vendor security assessments missing - [ ] COMP-020: Password policy not enforced - [ ] COMP-021: IP/device context not auto-captured - [ ] COMP-022: Consent table schema mismatch
Effort Estimate: 85-120 hours total (P0: 40-60 hrs, P1: 30-40 hrs, P2: 15-20 hrs)
TD-010: Remove Frontend Mock Mode Code¶
Status: Not Started Priority: P3 (Low - Technical Debt) Added: 2024-12-28
Description: Remove all DEV_MODE conditional branches and MOCK_* data arrays from frontend/packages/ui/src/lib/api.ts. Mock mode was disabled on 2024-12-28 in favor of using real API with seeded database.
Scope:
- Remove ~30 if (DEV_MODE) { ... } conditional blocks
- Remove MOCK_CLIENTS, MOCK_RETURNS, MOCK_DOCUMENTS, MOCK_ANALYSES, MOCK_CHAT_HISTORY arrays (~300 lines)
- Remove DEV_MODE constant and related comments
Effort Estimate: 1 hour
P0 Master Priority List (Pre-Production Blockers)¶
Status: Consolidated 2024-12-27 Total Items: 15 (Security: 6, Compliance: 8, Code: 1 conditional) Total Effort: 45-50 hours
Implementation order optimized for dependencies and quick wins:
Phase 1: Quick Wins (2.5 hrs) - Parallelize¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| SEC-003 | Missing security headers | 1 hr | Add middleware for X-Frame-Options, CSP, HSTS |
| SEC-005 | ORDER BY SQL injection risk | 1 hr | Whitelist allowed column names |
| SEC-006 | CORS overly permissive | 30 min | Restrict methods/headers in main.py |
Phase 2: XSS Fixes (3 hrs) - Parallelize¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| SEC-001 | XSS via template injection | 1 hr | Use html.escape() or Jinja2 autoescape |
| SEC-002 | dangerouslySetInnerHTML | 2 hrs | Add DOMPurify or refactor to React components |
Phase 3: Access Control (5 hrs) - Sequential¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| SEC-007 | Missing role enforcement | 2 hrs | Filter list endpoints by user role |
| COMP-005 | Account lockout missing | 3 hrs | Add lockout fields, check before auth |
Phase 4: Audit Logging (5 hrs) - Sequential after Phase 3¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-003 | Auth events not logged | 2 hrs | Call audit_service.log_auth() in auth routes |
| COMP-004 | Access not logged | 3 hrs | Add log_access() to all GET endpoints |
Phase 5: Consent System (6 hrs) - Sequential¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-001 | AI processing consent missing | 4 hrs | Add USE_AI_PROCESSING, check in BedrockService |
| COMP-002 | E-filing consent not enforced | 2 hrs | Validate Form 7216 before mark_ready_for_filing |
Phase 6: New Workflows (16 hrs) - Parallelize¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-007 | COI checks missing | 8 hrs | COI table, check workflow, API, audit log |
| COMP-008 | Form 2848 POA missing | 8 hrs | Form generation, signature, access control |
Phase 7: Data Protection (8 hrs)¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-006 | Field encryption missing | 8 hrs | encryption_service.py with pgcrypto wrapper |
Conditional¶
| ID | Issue | Effort | Condition |
|---|---|---|---|
| AUDIT-002 | local_api.py connection pooling | 2 hrs | Only if used for extended demos |
Delegation Strategy¶
- Phases 1-2: Can parallelize entirely (quick wins + XSS)
- Phases 3-4: Must sequence (access control before audit logging)
- Phase 5: Must sequence (consent type before e-filing check)
- Phase 6: Can parallelize (COI and POA are independent)
- Phase 7: Independent (can run anytime)
Expedited Analysis Pricing (PRICE-001)¶
Status: Backlog Priority: P1 (Revenue Feature) Target: Post-V1 Launch
Business Model¶
Tiered document analysis with freemium expedited processing:
| Processing Tier | Timing | Cost |
|---|---|---|
| Batch (Default) | Overnight | Included |
| Expedited | Immediate (~30 sec/doc) | Free quota, then $1.50/doc |
Freemium Model: - All clients receive 25 free expedited analyses per month - After quota exhausted: $1.50 per document OR wait for overnight batch - Counter resets on 1st of each month
User Experience¶
Document Upload Flow: 1. Documents uploaded show status: "Pending Analysis" 2. Banner displays: "3 documents pending. Overnight batch included, or analyze now." 3. Show quota: "15 of 25 expedited analyses remaining this month" 4. When quota exhausted, button changes to "Analyze Now ($1.50)" or "Queue for Overnight"
Recording Should Demonstrate: - Instant expedited analysis (within quota) - "Queued for overnight" state (quota exhausted or user choice) - Quota counter display
Implementation¶
Database Changes:
- Add expedited_analyses_used counter to client/account table (resets monthly)
- Add expedited_analyses_limit field (default: 25)
- Add analysis_status enum to documents: pending → queued → processing → complete
Backend: - Quota check before expedited processing - Airflow DAG for overnight batch processing - Stripe integration for overage billing
Frontend: - Status badges on documents - Quota counter in header/sidebar - Expedited vs batch choice dialog - Overage payment confirmation
Files to Create:
- src/domain/analysis_quota.py
- src/repositories/analysis_quota_repository.py
- src/workflows/document_analysis/batch_processor.py
- dags/overnight_analysis_dag.py
Margin Analysis¶
At current AI costs (~$0.05/doc for analysis): - Expedited fee: $1.50/doc - Cost: $0.05/doc - Margin: 97%
Free quota (25/month) costs ~$1.25/month per client - acceptable customer acquisition cost.
Duplicate Document Detection (DUP-001)¶
Status: ✅ Implemented (2024-12-27) Priority: P2 (Data Quality) Target: V1.1
Problem¶
Users may accidentally upload the same document multiple times, leading to: - Duplicate data in AI analysis - Confusion about which version is current - Wasted storage and processing
Solution¶
Detect duplicates on upload using file hash at two levels:
- On Upload: Calculate SHA-256 hash of file content
- Check Same Client: Query documents for this client with matching hash
- Check Cross-Client: Query all documents with matching hash (different client)
- Warn/Error: Show appropriate message based on match type
User Experience¶
Same Client Duplicate:
⚠️ Duplicate Document
"W2_Global_2023.pdf" matches a document uploaded on Dec 15, 2024.
[View Original] [Cancel]
Cross-Client Match (likely wrong client selected):
🚨 Document Belongs to Another Client
This file is already linked to: John Smith
(uploaded Dec 10, 2024)
Did you select the wrong client?
[View in John Smith] [Cancel]
Implementation (Completed)¶
Database Changes:
- Added file_hash column to documents table (VARCHAR(64))
- Added index: idx_documents_file_hash
API Changes:
- Calculate SHA-256 hash on upload
- Single query to check for existing hash, returns client_id
- Returns 409 Conflict with duplicate info if found:
- error: "duplicate_document"
- is_same_client: boolean
- existing_document: { id, client_id, client_name, filename, uploaded_at }
Files Modified:
- scripts/local_api.py - Hash calculation and duplicate check
- scripts/bootstrap.py - file_hash column and index
- frontend/.../DocumentUploadPage.tsx - Duplicate dialog with View Original / Cancel
- frontend/.../AnalysisDashboardPage.tsx - Inline drag-drop upload with same duplicate detection
Tests Added:
- tests/bdd/features/duplicate_detection.feature - 9 BDD scenarios
- tests/bdd/step_defs/test_duplicate_detection.py - Step definitions
- frontend/tests/features/duplicate-detection.feature - 12 Gherkin scenarios
- frontend/tests/duplicate-detection.spec.ts - Playwright test stubs
Name Mismatch Detection (DUP-002)¶
Status: Backlog Priority: P3 (Data Quality) Target: V1.2
Problem¶
Document uploaded to wrong client - e.g., W-2 for "John Smith" uploaded to client "Jane Doe".
Solution¶
Post-processing check after document extraction: 1. Extract name from document metadata (W-2, 1099, etc.) 2. Fuzzy match against client name 3. If mismatch, flag for review
Considerations¶
- Requires OCR/extraction to complete first (not instant like hash check)
- Fuzzy matching needed ("John Smith" vs "John A. Smith" vs "J. Smith")
- Some docs have no name (receipts, bank statements) - skip check
- Joint returns - both spouse names are valid matches
- Threshold for fuzzy match confidence
User Experience¶
After document processing completes:
⚠️ Name Mismatch
This W-2 shows "John Smith" but client is "Jane Doe".
[View Document] [Confirm Correct Client] [Move to Different Client]
Concurrent Edit Handling (CONC-001)¶
Status: Planning Complete Priority: P2 (Data Integrity) Target: V1.1 Plan: docs/plans/OPTIMISTIC_LOCKING.md
Problem¶
Two users editing the same record simultaneously results in silent data loss (Last Write Wins). No conflict detection or user notification.
Solution: Optimistic Locking¶
Add version field to mutable entities. On update, verify version matches; if stale, return 409 Conflict with current data for user resolution.
Scope¶
| Entity | Risk | Implementation |
|---|---|---|
| Clients | High | Version column, API check, conflict dialog |
| Tax Returns | High | Version column, API check, conflict dialog |
| Documents | Medium | Version column, API check, conflict dialog |
| Users | Low | Version column, API check |
Implementation Summary¶
| Component | Changes | Effort |
|---|---|---|
| Database | Add version column to 4 tables |
30 min |
| API | Version in responses, check on updates, 409 handling | 2 hrs |
| Frontend | State tracking, ConflictDialog component | 2 hrs |
| Tests | Unit (5), Integration (3), E2E (3), BDD (3) | 3 hrs |
| Total | 8-10 hrs |
Delegation Strategy¶
- Haiku: Database migration, API endpoint updates, unit tests
- Sonnet: Frontend state management, conflict dialog, integration/E2E tests
Test Cases¶
- Update with correct version succeeds, version increments
- Update with stale version returns 409 Conflict
- Missing version returns 400 Bad Request
- Conflict response includes current data
- Conflict dialog appears and functions correctly
Acceptance Criteria¶
- All mutable entities have
versioncolumn - All update endpoints require and verify version
- 409 Conflict returned with current data on mismatch
- Frontend displays conflict resolution dialog
- User can discard changes or retry (overwrite)
- All tests passing
Field-Level Merging (CONC-002)¶
Status: Planned Priority: P3 (UX Enhancement) Target: V1.2 Depends on: CONC-001 Plan: docs/plans/OPTIMISTIC_LOCKING.md Section 10
Problem¶
Record-level locking (CONC-001) forces all-or-nothing conflict resolution. If User A changes phone and User B changes email, they shouldn't conflict.
Solution¶
Track which fields were changed. Auto-merge non-overlapping changes. Only show conflict dialog for same-field edits.
Example - Auto-merge: - User A changes phone - User B changes email - Result: Both saved, no conflict
Example - Partial conflict: - User A changes phone + email - User B changes email - Result: Phone auto-merged, email conflict shown
Effort Estimate¶
4-6 hours (incremental on CONC-001)
Special Conflict Scenarios (CONC-003)¶
Status: Planned Priority: P3 (Data Integrity) Target: V1.2 Depends on: CONC-001 Plan: docs/plans/OPTIMISTIC_LOCKING.md Section 10
Scenarios¶
Delete vs Update: User A deletes record while User B is editing. → Block delete or warn "record was deleted"
Status Race: Two users change return status simultaneously. → State machine validates transitions
Assignment Collision: Two preparers claim same return. → "Already assigned to X" message
Bulk vs Single: Bulk update while individual edit in progress. → Per-record version check, report partial failures
Parent-Child Constraints: Delete client with active returns. → "Cannot delete: has N active returns"
Effort Estimate¶
6-8 hours
Client Actions (UI-001, UI-002)¶
UI-001: Send Portal Invite¶
Status: Backlog Priority: P2 (Client Experience) Target: V1.1
Send invitation email to client with portal access link. Triggers welcome email with secure login instructions.
UI-002: Create Engagement Letter¶
Status: Backlog Priority: P2 (Workflow) Target: V1.1
Generate engagement letter from template, pre-filled with client info. Integrates with S3-001/S3-002 (Engagement Letter Generation and E-Signature).
AI-Support Ticketing System (SUP-001)¶
Status: Backlog Priority: P2 (Client Experience) Target: V1.2
Problem¶
Client questions and issues currently handled via email/phone without centralized tracking. No AI assistance in triaging or responding to common questions.
Solution¶
Ticketing system with AI-powered triage and response assistance:
- Ticket Intake: Clients submit questions via portal, email, or phone (staff creates ticket)
- AI Triage: Auto-classify ticket type, priority, and route to appropriate staff
- AI Draft Response: Generate suggested response for common questions
- Staff Review: Staff reviews AI draft, edits if needed, and sends
- Resolution Tracking: Track time to resolution, client satisfaction
Features¶
Ticket Types: - Document request (missing W-2, need copy of return) - Status inquiry (where is my refund, when will return be filed) - Tax question (can I deduct X, how does Y work) - Technical support (portal access, password reset) - Billing inquiry (invoice questions, payment issues) - Appointment request (schedule call, meeting)
AI Capabilities: - Auto-categorize incoming tickets by type and urgency - Suggest priority based on content analysis - Draft responses using client context (return status, documents, prior communications) - Flag tickets requiring senior staff attention - Detect sentiment (frustrated, urgent, routine)
Staff Interface: - Unified inbox with AI-suggested priorities - One-click approve AI draft or edit - Internal notes and escalation - Response templates with merge fields - Time tracking per ticket
Client Interface: - Submit new ticket from portal - View ticket status and history - Receive notifications on updates - Rate satisfaction after resolution
Implementation¶
Database:
- tickets table (id, client_id, type, priority, status, created_at, resolved_at)
- ticket_messages table (id, ticket_id, sender_type, content, ai_draft, created_at)
- ticket_templates table (id, type, subject, body)
Services:
- src/services/ticket_service.py - Core ticketing logic
- src/services/ticket_ai_service.py - AI triage and response generation
API:
- src/api/routes/tickets.py - CRUD endpoints
- src/api/schemas/ticket_schemas.py - Request/response schemas
Frontend:
- frontend/apps/staff-app/src/pages/TicketsPage.tsx - Staff ticket queue
- frontend/apps/staff-app/src/components/tickets/TicketDetail.tsx - Individual ticket view
- frontend/apps/client-portal/src/pages/SupportPage.tsx - Client ticket submission
AI Prompts:
- src/prompts/tickets/triage.txt - Classify and prioritize
- src/prompts/tickets/draft_response.txt - Generate response draft
Effort Estimate¶
- Backend: 16-20 hours
- Frontend: 12-16 hours
- AI Integration: 8-12 hours
- Testing: 8-10 hours
- Total: 44-58 hours
Metrics¶
- Average response time
- First-contact resolution rate
- AI draft acceptance rate
- Client satisfaction scores
- Tickets per client per season
Future Phases¶
PHASE-BKP: Bookkeeping Module¶
Status: Requirements Drafted Priority: Post-MVP Requirements: bookkeeping_requirements.md
Phased implementation: - Phase 1: Bank statement upload, AI categorization, QuickBooks export - Phase 2: Reconciliation, recurring transaction detection - Phase 3: Full bookkeeping (chart of accounts, P&L, Balance Sheet, two-way sync)
Pending Client Input: - Service level (tax-ready categorization vs full bookkeeping) - Target clients (business entities only vs all clients) - Pricing model (monthly retainer vs per-transaction)
AI Cost Optimization (SaaS Profitability)¶
Critical for SaaS business model. See COST_DETAIL.md for full analysis.
OPT-001: Metadata Caching¶
Status: ✅ Complete (2025-12-28) Priority: P0 (Core to SaaS margins) Estimated Impact: 60-75% token reduction, $0.24/return savings
Problem: Every AI query re-reads source documents. A W-2 referenced 10 times costs 10× the tokens.
Solution: Extract once, cache as markdown, reference forever.
Implementation:
- [x] Create metadata MD file on first document scan (<350 lines each)
- [x] Per-document: {client_id}/{return_year}/{doc_id}_metadata.md
- [x] Per-return: {client_id}/{return_year}/return_summary.md
- [x] Store in S3 alongside original documents
- [x] Index metadata location in Aurora (document_metadata table)
- [x] AI reads cache first; only re-scan if stale or confidence < 90%
- [x] Refresh trigger when document updated (mark_document_stale)
Metadata file contents: - Document type, source, upload date, confidence score - Extracted values (wages, withholding, etc.) in structured tables - AI notes (prior year comparison, anomalies) - Flags and questions
Savings: | Metric | Without Cache | With Cache | |--------|---------------|------------| | Tokens/return | 126,500 | ~50,000 | | AI cost/return | $0.40 | $0.16 |
OPT-002: Batch API Default¶
Status: Not Started Priority: P0 (Core to SaaS margins) Estimated Impact: 50-60% cheaper on batch-eligible tasks
Problem: Interactive API calls cost more than batch.
Solution: Default to batch processing with opt-in for real-time.
Implementation: - [ ] UX: "I'll have results ready tomorrow morning. [Start Live Session]" - [ ] Queue document processing overnight via Airflow - [ ] Pre-generate worksheets for morning review - [ ] Only Preparer Q&A requires real-time - [ ] Track batch vs interactive usage for cost analysis
Batch-eligible tasks: - Document classification and extraction - Prior year comparison - Missing document detection - Worksheet generation - Rejection analysis - Tax reminders
Savings: 80% batch acceptance → $0.40 → $0.28/return
OPT-003: Model Delegation Strategy¶
Status: Not Started Priority: P1 Estimated Impact: 38% AI cost reduction
Solution: Use cheapest model capable for each task.
SONNET (Orchestrator) → HAIKU (60% - extraction)
→ SONNET (35% - analysis)
→ OPUS (5% - expert review)
Implementation: - [ ] Haiku for document extraction and classification - [ ] Sonnet for Q&A, comparisons, worksheets - [ ] Opus only for tax code interpretation, audit risk, complex scenarios - [ ] Auto-escalation when confidence < 80% or complex entity types - [ ] User option: "Ask the expert" to force Opus
Savings: All-Sonnet ($0.64) → Delegation ($0.40) = 38% reduction
OPT-004: Combined Optimization Target¶
Priority: P0 Target: $0.12/return AI cost (vs $0.40 baseline)
Combined impact: 1. Batch processing (80%): $0.40 → $0.28 2. Metadata caching (75%): $0.28 → $0.12 3. Model delegation: Further optimization within each tier
At scale (10,000 returns): - Baseline: $4,000/year AI cost - Optimized: $1,200/year AI cost - Annual savings: $2,800
OPT-005: Prompt Compression¶
Status: Not Started Priority: P2 (Quick Win) Estimated Impact: 10-20% additional token reduction
Problem: System prompts, tax code references, and instructions repeat on every call.
Solution: Use prompt compression tools (LLMLingua) to reduce prompt size while preserving meaning.
Implementation: - [ ] Evaluate LLMLingua for system prompt compression - [ ] Compress static instructions and tax code references - [ ] Benchmark quality vs compression ratio - [ ] A/B test compressed vs full prompts
OPT-006: Output Token Limits¶
Status: Not Started Priority: P2 (Quick Win) Estimated Impact: 5-15% output token reduction
Problem: AI generates verbose explanations when structured data is sufficient.
Solution: Force shorter, structured responses for extraction and classification tasks.
Implementation: - [ ] Set max_tokens limits per task type - [ ] Use structured JSON outputs for extraction - [ ] Reserve verbose mode for Q&A only - [ ] Track output token usage by task type
OPT-007: Semantic Caching¶
Status: Not Started Priority: P2 Estimated Impact: 15-30% reduction on repeated queries
Problem: Similar questions hit the AI repeatedly. "How do I report crypto?" and "Where do cryptocurrency gains go?" are the same question.
Solution: Cache responses by meaning, not exact match. Return cached answer for semantically similar queries.
Implementation: - [ ] Implement vector embeddings for query similarity - [ ] Set similarity threshold for cache hits - [ ] Focus on client Q&A and preparer questions - [ ] Track cache hit rate and quality
OPT-008: RAG Optimization¶
Status: Not Started Priority: P2 Estimated Impact: 20-30% context token reduction
Problem: Sending entire prior-year returns when only specific sections are relevant.
Solution: Retrieve only relevant document sections based on the question.
Implementation: - [ ] Chunk documents into semantic sections - [ ] Index chunks with vector embeddings - [ ] Retrieve top-k relevant chunks per query - [ ] Benchmark quality vs full-document approach
OPT-009: Confidence-Based Escalation¶
Status: Not Started Priority: P3 Estimated Impact: 10-20% model cost reduction
Problem: Pre-routing tasks to specific models may over-allocate expensive models.
Solution: Let Haiku attempt everything first, escalate dynamically based on confidence scores.
Implementation: - [ ] Define confidence thresholds per task type - [ ] Implement escalation pipeline (Haiku → Sonnet → Opus) - [ ] Track escalation rates and quality - [ ] Tune thresholds based on error rates
OPT-010: Fine-Tuning (Future)¶
Status: Not Started Priority: P4 (Volume-dependent) Estimated Impact: 30-50% cost reduction on fine-tuned tasks
Problem: Using general-purpose models for repetitive, domain-specific tasks.
Solution: Train smaller models on our specific tasks (document classification, common Q&A).
Implementation: - [ ] Collect 6+ months of production data - [ ] Identify high-volume, repetitive tasks - [ ] Fine-tune Haiku or open-source model (LoRA/QLoRA) - [ ] A/B test fine-tuned vs general model
OPT-011: Knowledge Distillation (Future)¶
Status: Not Started Priority: P4 (Volume-dependent) Estimated Impact: 50-85% cost reduction on distilled tasks
Problem: Want Opus-quality responses at Haiku prices.
Solution: Use Opus to generate training data, teach Haiku to mimic Opus responses.
Implementation: - [ ] Generate "gold standard" responses with Opus - [ ] Create training dataset from Opus outputs - [ ] Train Haiku to reproduce Opus-quality responses - [ ] Deploy distilled model for production use
Design Decisions Log¶
| ID | Date | Decision | Rationale |
|---|---|---|---|
| DD-001 | 2024-12-22 | Service centralization as core pattern | Single point of access for all external services - easier maintenance, testing, audit |
| DD-002 | 2024-12-22 | Mixed Java/Python architecture | Java for performance-critical processing, Python for API/orchestration/AI |
| DD-003 | 2024-12-22 | Shared config.yaml for both languages | Single source of truth, environment variable substitution |
| DD-004 | 2024-12-22 | Aurora-only starting architecture | Aurora handles all data including 7+ years retention (via table partitioning). Tiered storage (Athena/Snowflake) adds complexity not justified at small scale. S3 for document storage only, not as query layer. Add analytics tier only when Aurora partitioning insufficient. Both Aurora and Snowflake meet security/compliance requirements for tax data. |
| DD-005 | 2024-12-23 | Self-hosted Airflow orchestration | Use self-hosted Airflow on EC2 t3.medium (~$23/mo reserved) instead of MWAA ($360+/mo) or Step Functions. Full control, Python DAGs, built-in UI/monitoring. Acceptable maintenance overhead for small practice. |
| DD-006 | 2024-12-23 | Bookkeeping as separate requirements document | Bookkeeping has different cadence (monthly vs annual), different workflow, and could serve non-tax clients. Separate document allows independent prioritization and phasing. Shares infrastructure with tax system. |
| DD-007 | 2024-12-23 | Bookkeeping phased approach | Phase 1: tax-ready categorization + QuickBooks export. Phase 2: reconciliation. Phase 3: full bookkeeping. Start light, design for full. QuickBooks is system of record initially. |
| DD-008 | 2024-12-23 | V1 integration strategy: integrate, don't replace | Integrate with SmartVault (client portal), SurePrep (OCR/extraction), and UltraTax (via SurePrep CS Connect). Don't replace industry standard tools in V1. Future versions may replace SurePrep to capture per-return fees, but UltraTax lacks API (blocker). |
| DD-009 | 2024-12-23 | Dual integration pattern: Services + Skills | Services handle API calls (how to call). Skills provide AI context (how to understand). SmartVault, SurePrep, UltraTax each get skills for AI to interpret their data. UltraTax has skill but no service (no API). |
| DD-010 | 2024-12-23 | Multi-tenant SaaS: Separate databases per tenant | Deploy as SaaS with separate database per tenant firm within one Aurora cluster. Strongest isolation for tax compliance, shared tiered pricing, easy to migrate growing tenants to dedicated clusters. Requires tenant routing middleware and dynamic connection management. |
| DD-011 | 2024-12-24 | Frontend: React + Vite two-app architecture | Two separate apps (client-portal, staff-app) with shared component library (@tax-practice/ui). React 18, Vite, TypeScript, Tailwind, shadcn/ui, React Query, Zustand. Production-grade, sellable stack. HTMX considered but React chosen for richer UX and market perception. |
Completed Items¶
DOC-001: Pre-Implementation Documentation¶
Completed: 2024-12-23
- DATABASE_SCHEMA.sql - Complete 35-table schema with 50 enums
- DATA_MODEL.md - Logical data model with ER diagrams
- API_SPECIFICATION.md - Full REST API contract
- INTEGRATION_CONTRACTS.md - External service integrations
- SECURITY_DESIGN.md - Security architecture and controls
- PROCESS_FLOWS.md - State machines and workflows
- USER_STORIES.md - 82 prioritized user stories (77 MVP, 5 post-MVP)
DOC-002: Documentation Reconciliation¶
Completed: 2024-12-23
All pre-implementation specifications validated and reconciled: - Enum values aligned across all documents - Missing endpoints added to API specification - ER diagram updated with all 35 entities - Webhook handlers aligned with integration contracts - Cross-references added between documents
Tax Reference System¶
Tax code reference system created 2024-12-27. Serves both Claude Code /tax skill and Bedrock chatbot (RAG).
Structure: docs/tax-reference/ with federal/, states/, common/ subdirectories organized by tax year.
Maintenance Model: Tax staff direct updates via conversation with Claude. Claude makes the actual file edits. No direct staff editing of markdown files.
TAX-REF-001: Senior Tax Administrator Role (RBAC)¶
Status: Backlog Priority: P2 Dependency: RBAC system implementation
Add RBAC role "Senior Tax Administrator" with permission to direct tax reference updates: - Can instruct Claude to update tax brackets, thresholds, rules - Can approve changes before Claude commits them - Audit log of all tax reference modifications - Year-over-year update workflow (annual review process)
TAX-REF-002: Firm-Specific Tax Skills/Overrides¶
Status: Backlog Priority: P2 Dependency: TAX-REF-001
Allow firms to customize tax reference with firm-specific guidance: - Override default rules with firm interpretations - Add firm-specific notes/cautions on tax positions - Custom worksheets and checklists per firm - Inheritance model: firm overrides layer on top of base tax reference - Storage: docs/tax-reference/overrides/{firm_id}/ or similar
TAX-REF-003: Bedrock Knowledge Base Integration¶
Status: Backlog Priority: P1 Dependency: BedrockService implementation
Integrate tax reference files with client-facing chatbot via hybrid approach:
Bedrock Knowledge Base (RAG): - Index docs/tax-reference/ into Bedrock Knowledge Base - Automatic chunking and embedding of markdown files - Semantic search for open-ended questions ("tell me about NY taxes") - Re-index on file updates (triggered by Claude commits)
Direct File Injection: - BedrockService.get_tax_context(jurisdiction, topic, year) method - Loads specific markdown file for deterministic context - More token-efficient for specific lookups ("NYC MFJ rate") - Routing logic maps question intent to file path
Implementation Steps: - Create Bedrock Knowledge Base with S3 data source pointing to docs/tax-reference/ - Add sync trigger when tax reference files are updated - Implement get_tax_context() in BedrockService - Add intent detection to route between KB search vs direct injection - Include source citations in chatbot responses (file path + section)
Benefits: - Single source of truth (same files serve Claude Code skill and chatbot) - Granular file structure enables precise context injection - Citations provide audit trail for tax advice
TAX-REF-004: Per-Client Tax Context Metadata¶
Status: Backlog Priority: P1 Dependency: AI Analysis workflow, TAX-REF-003
Auto-generated per-client tax context that primes the chatbot with relevant jurisdictions, skills, and flags.
File Structure:
s3://tax-practice-documents/{client_id}/{return_year}/
├── tax_context.md # Current state (~50 lines, always loaded)
├── tax_context_pending.md # Chatbot-flagged updates awaiting preparer review
└── tax_context_history.md # Full changelog/audit trail (on-demand)
tax_context.md Contents: - Jurisdictions (states, localities detected from documents) - Tax skills required (references to docs/tax-reference/ files) - Flags (part-year, multi-state, audit risks, special situations) - Document-derived values (W-2 states, 1099 sources, property locations)
Creation - Part of AI Analysis (automatic, not user-triggered): - Analysis prompt includes jurisdiction detection and skill mapping - Analysis output structure includes tax_context section - Workflow writes tax_context.md as required output - No analysis is "complete" without tax context file
Updates - Delta Detection: - New document uploaded → re-analyze, update if jurisdictions/flags change - Chatbot conversation reveals new info → flag in tax_context_pending.md - Preparer approves pending → applied to tax_context.md, logged to history
Chatbot Flagging: - Chatbot detects tax-relevant statements (new state, life events, corrections) - Creates entry in tax_context_pending.md with suggested changes - Pending items surface in preparer queue with badge count - Preparer approves/rejects/modifies before changes apply - Chatbot cannot directly modify tax_context.md (audit trail integrity)
Version History (tax_context_history.md): - Every change logged with: timestamp, trigger, source (doc ID or conversation ID), actor, diff - S3 versioning enabled as backup - Supports dispute resolution: "On Dec 30 you told us X, here's the conversation"
Chatbot Loading Strategy: - tax_context.md: Always injected as system context - tax_context_pending.md: Loaded when preparer is in session - tax_context_history.md: On-demand ("What did I say about my move date?")
Tiered Context Loading: - Layer 1: tax_context.md (always, ~50 lines) - Layer 2: Specific tax skill files from docs/tax-reference/ (on-demand based on question) - Layer 3: Full Bedrock KB search (if question exceeds listed skills)
Templates: See docs/tax-reference/templates/ for file structure definitions.
Impact Analysis (2024-12-27):
Files to Modify (10): - src/workflows/analysis/preliminary_analysis_workflow.py - Add tax context generation after analysis - src/workflows/documents/classification_workflow.py - Add delta detection after classification - src/services/chat_service.py - Load tax context into chatbot, add flagging logic - src/workflows/review/ai_qa_workflow.py - Include tax context in Q&A prompts - frontend/apps/staff-app/src/pages/ReviewPage.tsx - Add pending review section - frontend/apps/staff-app/src/pages/ClientDetailPage.tsx - Add badge for pending updates - frontend/apps/staff-app/src/components/chat/ChatDrawer.tsx - Show flagged items indicator - src/services/audit_service.py - Log tax context events
New Files to Create (11): - src/services/tax_context_service.py - Core service: generate, update, manage tax context files - src/prompts/tax_context/generate_initial.txt - Prompt for initial tax context generation - src/prompts/tax_context/detect_delta.txt - Prompt for comparing new docs against existing context - src/prompts/tax_context/flag_statement.txt - Prompt for chatbot to detect tax-relevant statements - src/api/routes/tax_context.py - CRUD endpoints for tax context - src/api/schemas/tax_context_schemas.py - Request/response schemas - frontend/apps/staff-app/src/components/tax-context/PendingReviewPanel.tsx - Review/approve UI - frontend/apps/staff-app/src/components/tax-context/TaxContextViewer.tsx - Display current context - tests/unit/services/test_tax_context_service.py - tests/integration/test_tax_context_workflow.py - tests/e2e/test_tax_context_ui.py
No Changes Needed: - src/services/s3_service.py - Existing methods sufficient for read/write
Effort Estimate (Claude-only development with human guidance): - Backend: 15-20 hours of review/approval time - Frontend: 8-12 hours of review/approval time - Testing: 8-10 hours of review/approval time - Documentation: 2-3 hours of review/approval time - Total: 33-45 hours of human time (~1-1.5 weeks) - Note: Generic industry estimate for human developer would be 75-98 hours
Critical Path: 1. TaxContextService creation (foundational) 2. Prompt templates (generate_initial, detect_delta, flag_statement) 3. Integration into preliminary_analysis_workflow (initial generation) 4. Integration into classification_workflow (delta detection) 5. ChatService integration (context loading + flagging) 6. API routes creation 7. UI components (PendingReviewPanel, TaxContextViewer)
Risk Areas: - Delta detection accuracy (false positives/negatives) - mitigate with prompt tuning - Chatbot flagging noise - add confidence thresholds, allow preparer to disable - S3 file conflicts on concurrent updates - use S3 versioning, queue-based updates
Notes¶
- Tax requirements: client_facing_docs/tax_practice_ai_requirements.md
- Bookkeeping requirements: bookkeeping_requirements.md
- User stories: USER_STORIES.md (82 stories in 18 sequences)
- Development timeline target: January 2027 filing season (per requirements Section 11)
- Critical path items have lead times (vendor selection, account setup) - see Section 11.4 of requirements
Document Inventory¶
| Category | Document | Purpose |
|---|---|---|
| Requirements | tax_practice_ai_requirements.md | Full business requirements |
| Requirements | bookkeeping_requirements.md | Bookkeeping module (post-MVP) |
| Requirements | MIGRATION_REQUIREMENTS.md | Data migration (pre-launch) |
| Planning | USER_STORIES.md | 87 prioritized implementation stories (4 in S0, 10 in S2) |
| Planning | backlog.md | Priority tracking and decisions |
| Technical | ARCHITECTURE.md | System design and patterns |
| Technical | DATABASE_SCHEMA.sql | PostgreSQL schema definition |
| Technical | DATA_MODEL.md | Logical data model |
| Technical | API_SPECIFICATION.md | REST API contract |
| Technical | INTEGRATION_CONTRACTS.md | External service integrations |
| Technical | SECURITY_DESIGN.md | Security controls |
| Technical | PROCESS_FLOWS.md | State machines and workflows |
| Operations | RUNBOOK.md | Operational procedures |