Tax Practice AI - Backlog¶

Last updated: 2025-12-28 (v0.14 - S13 Complete, 72% progress)

This document tracks priority items, technical debt, and pending decisions.

V1 Back-Office AI Companion (Tax Season 2025)¶

Status: Requirements Complete, Implementation In Progress Target: January 2025 deployment for Tax Season 2025

V1 deploys as a back-office companion tool - staff use AI analysis alongside existing workflows without disrupting clients.

V1 Scope¶

Feature	Status	Description
Quick Client Entry	Defined	Minimal form: name, tax year, legacy account link
Drag-Drop Upload	Defined	Upload documents directly into viewer
Folder Import	Defined	Import entire folder of documents
AI Classification	Defined	Automatic document type identification
AI Analysis	Defined	Prior year comparison, anomaly detection, missing docs
Q&A Assistant	Defined	Ask questions during review with source citations
Annotations	Defined	Notes, flags, questions on documents
Worksheet Export	Defined	PDF/Excel with full source citations
S3 Fallback	Defined	AI works when cloud storage unavailable

V1 Documentation¶

Document	Purpose
V1_COMPANION_REQUIREMENTS.md	Full requirements
V1_USE_CASES.md	Detailed use cases
V1_UI_CHANGES.md	UI specifications
ARCHITECTURE.md Section 16	Deployment model

V1 Testing¶

BDD feature files created (Gherkin syntax)
16 BDD scenarios passing (quick client, document upload)
Test data generator available: python scripts/generate_test_data.py
Sample documents generated: W-2s, 1099s, bank statements, receipts, CSVs

V1 Philosophy¶

Augment, don't replace. Zero client disruption.

Clients continue using SmartVault for uploads
UltraTax remains the tax prep tool
Legacy system remains source of truth
New account numbers prefixed with 'A' (pending client confirmation)

Client Decisions (Resolved)¶

TAX-001: Tax Software Selection¶

Status: ✅ Resolved (2024-12-23) Decision: UltraTax CS (Thomson Reuters) Integration: Via SurePrep CS Connect bridge (UltraTax has no direct API)

TAX-002: Volume Projections¶

Status: ✅ Resolved (2024-12-23) Decision: ~1,000 returns/year with 30% annual growth expectation

Timeframe	Returns per Year
Year 1	1,000
Year 2	1,300
Year 3	1,700
Year 5	2,850

Note: Growth may accelerate due to system efficiency gains.

TAX-003: Entity Type Mix¶

Status: ✅ Resolved (2024-12-23) Decision: Equal priority for individuals and businesses. Business clients are primarily small businesses (advanced individuals). No differentiation needed in V1.

TAX-004: State Coverage¶

Status: ✅ Resolved (2024-12-23) Decision: Florida + surrounding states (GA, AL, SC, NC) initially. Design for all 50 states from the start - full coverage expected soon.

Phase 0: Data Migration (Pre-Launch Prerequisite)¶

Data migration must be completed before go-live. Depends on Sequence 1 infrastructure.

MIG-001: Client Data Import Tool¶

Status: ✅ COMPLETE Priority: P0 (Pre-Launch Blocker)

CLI tool: tax-migrate clients <file>
CSV/Excel parsing with column mapping
Duplicate detection and handling
Account number generation for imports
Dry-run mode for validation
Import summary report
Audit logging

Files: - scripts/tax_migrate.py - CLI entry point (per MIG-080) - src/migration/__init__.py - Module exports - src/migration/client_importer.py - Main import logic (MIG-001 through MIG-008) - src/migration/column_mapper.py - Flexible column mapping (MIG-002) - src/migration/duplicate_detector.py - Duplicate detection (MIG-010 through MIG-013) - src/migration/import_report.py - Report generation (MIG-021, MIG-110 through MIG-114) - tests/unit/test_migration_column_mapper.py - Unit tests (16 tests)

Requirements: See MIGRATION_REQUIREMENTS.md Section 2

MIG-002: Bulk Document Import Tool¶

Status: ✅ COMPLETE Priority: P0 (Pre-Launch Blocker)

CLI tool: tax-migrate documents <folder> (with --preview mode)
Folder structure pattern matching (client-name-first, account-first, year-first)
Document classification by filename (W-2, 1099, 1098, K-1, identity docs, etc.)
Client matching with fuzzy name support (configurable threshold)
Malware scanning integration (placeholder for ClamAV)
Unmatched document quarantine (--quarantine-dir option)
Import report with match statistics (classification stats, match confidence)

Files: - src/migration/document_classifier.py - Document type classification (MIG-040, MIG-042, MIG-043) - src/migration/client_matcher.py - Client matching with fuzzy support (MIG-050 through MIG-054) - src/migration/document_importer.py - Main import logic (MIG-030 through MIG-036) - scripts/tax_migrate.py - CLI entry point (documents subcommand) - tests/unit/test_migration_document_classifier.py - Unit tests (22 tests)

Requirements: See MIGRATION_REQUIREMENTS.md Section 3

MIG-003: Historical Return Data Import¶

Status: ✅ COMPLETE Priority: P0 (Pre-Launch Blocker)

CLI tool: tax-migrate history <file> (with --preview, --dry-run modes)
UltraTax export format support (per MIG-064)
Generic CSV format support (per MIG-065)
Prior year AGI import and client update (per MIG-060)
Filing status normalization (per MIG-061)
Refund/balance due import (per MIG-062)
Return history record creation (per MIG-070)
Client matching by external_id and SSN-4

Files: - src/migration/history_importer.py - History import logic (MIG-060 through MIG-070) - scripts/tax_migrate.py - CLI entry point (history subcommand) - tests/unit/test_migration_history_importer.py - Unit tests (27 tests)

Requirements: See MIGRATION_REQUIREMENTS.md Section 4

MIG-004: Migration Validation & Rollback¶

Status: ✅ COMPLETE Priority: P1

Migration summary report generation (per MIG-110)
Source vs imported count comparison (per MIG-111)
Error and warning listing (per MIG-112)
Sample client report for spot-checking (per MIG-113)
Rollback preview (per MIG-140)
Rollback CLI: tax-migrate rollback <batch-id> (per MIG-142)
List recent migration batches: tax-migrate rollback --list
Validate batch: tax-migrate rollback <batch-id> --validate
Generate report: tax-migrate rollback <batch-id> --report

Files: - src/migration/migration_validator.py - Validation and rollback logic - scripts/tax_migrate.py - CLI entry point (rollback subcommand) - tests/unit/test_migration_validator.py - Unit tests (16 tests)

Requirements: See MIGRATION_REQUIREMENTS.md Sections 6, 8

Phase 1: Foundation (Complete)¶

FOUND-001: Project Structure Setup¶

Status: Complete Priority: P0

FOUND-002: Service Centralization Framework¶

Status: Complete Priority: P0

Create src/services/base_service.py
Create src/services/init.py (ServiceRegistry)
Create src/config/settings.py
Create src/config/secrets.py (AWS Secrets Manager - future)

FOUND-003: Snowflake Service Implementation¶

Status: Not Started Priority: P1

Create src/services/snowflake_service.py
Implement connection pooling
Add query logging for audit
Add health check method

FOUND-004: Aurora Service Implementation¶

Status: Complete Priority: P1

Create src/services/aurora_service.py
Implement connection pooling
Add transaction support
Add health check method
Add database error mapping (ConflictError, ValidationError, etc.)

Phase 2-7: Client, Document & Tax Preparation (Complete)¶

Sequence 2: Client Identity (Complete)¶

S2-001: Client Self-Registration (Done)
S2-002: Identity Verification with Persona (Done)
S2-003: Returning Client Authentication (Done)
S2-004: Profile Management (Done)

Sequence 3: Engagement (Complete)¶

S3-001: Engagement Letter Generation (Done)
S3-002: E-Signature via Google Docs (Done)
S3-003: Form 7216 Consent Management (Done)

Sequence 4: Document Management (Complete)¶

S4-001: Document Upload via Portal (Done)
S4-002: Document Upload via Email (Done)
S4-003: Malware Scanning (Done)
S4-004: Document Classification and Extraction (Done)
S4-005: SmartVault Integration (Done)
S4-006: SurePrep Integration (Done)
S4-007: Document Checklist Management (Done)
S4-008: Manual Extraction Correction (Done)

Sequence 5: AI Analysis (Complete)¶

S5-001: Preliminary Return Analysis (Done)
S5-002: Prior Year Comparison (Done)
S5-003: Missing Document Detection (Done)
S5-004: AI-Powered Q&A (Done)
S5-005: Extraction Corrections (Done)
S5-006: Analysis Dashboard (Done)

Sequence 6: Tax Preparation Workflow (Complete)¶

S6-001: Workflow State Machine (Done)
S6-002: Preparer Assignment (Done)
S6-003: Reviewer Assignment (Done)
S6-004: Progress Tracking (Done)
S6-005: Dashboard Views (Done)
S6-006: Time Tracking (Done)
S6-007: Priority Management (Done)
S6-008: Batch Operations (Done)
S6-009: Analytics (Done)

Sequence 7: Preparer & Reviewer Interface (Complete)¶

S7-001: Interactive Review Interface (Done)
S7-002: AI Q&A Assistant (Done)
S7-003: Change Tracking (Done)
S7-004: Final Review Package (Done)

Sequence 8: Client Communication (Complete)¶

S8-001: Secure Portal Messaging (Done)
S8-002: Email Notifications (Done)
S8-003: SMS Notifications (Done)
S8-004: Callback Scheduling (Done)
S8-005: Notification Preferences (Done)

Sequence 9: Client Delivery (Complete)¶

S9-001: Tax Package Generation (Done)
S9-002: Google Workspace Signature Integration (Done)
S9-003: Payment Authorization (Done)

Sequence 10: E-Filing Status Tracking (Complete)¶

S10-001: E-File Status Monitoring (Done)
S10-002: Mark Return Ready for Filing (Done)
S10-003: Filing Ready Check (Done)
S10-004: Rejection Management (Done)

Sequence 11: Billing & Payments (Complete)¶

S11-001: Stripe Service Implementation (Done)
S11-002: Invoice Generation (Done)
S11-003: Payment Collection (Done)
S11-004: Payment Reminders (Done)

Sequence 12: Estimated Tax Management (Complete - 3 of 4 stories)¶

S12-001: Estimated Tax Calculation (Done)
S12-002: Voucher Generation (Done)
S12-003: Calendar Event Generation (Done)
S12-004: Estimated Tax Reminders (DEFERRED - see note below)

S12-004 Deferral Note: By sending estimated tax reminders, the firm implies responsibility for notifying clients. When emails or SMS are missed (spam filters, wrong number, etc.), clients blame the firm for their missed payments and penalties. This inappropriately shifts liability. Clients should use calendar events (S12-003) instead.

Implementation Files: - Domain: src/domain/estimated_tax.py - Repository: src/repositories/estimated_tax_repository.py - Workflows: src/workflows/estimated_tax/ (calculation, voucher, calendar) - API: src/api/routes/estimated_tax.py - Schemas: src/api/schemas/estimated_tax_schemas.py

Sequence 13: AI Chat (Complete)¶

S13-001: Chat Domain & Repository (Done)
S13-002: Chat Service with CLI/API Modes (Done)
S13-003: Chat API Routes (Done)
S13-004: Staff-App Chat UI (Done)
S13-005: Chat Integration Tests (Pending)

Features: - Tax-focused system prompt with off-topic redirection - Single-client scope enforcement (no cross-client queries) - CLI mode for local dev (uses developer's Claude subscription) - API mode for production (AWS Bedrock) - Extended context building (client, return, documents, prior year) - Floating drawer UI in staff-app ClientDetailPage - Token/cost tracking per session

Implementation Files: - Domain: src/domain/chat.py - Repository: src/repositories/chat_repository.py - Service: src/services/chat_service.py - API: src/api/routes/chat.py - Schemas: src/api/schemas/chat_schemas.py - Frontend: frontend/apps/staff-app/src/components/chat/ - Hook: frontend/apps/staff-app/src/hooks/useChat.ts

Future (Backlog): - S13-006: Client Portal Chat - S13-007: WebSocket Streaming - S13-008: Cross-Client Queries (after security/cost analysis) - S13-009: Opus Escalation ("Ask the Expert" button)

Technical Debt¶

TD-001: Java Build Configuration¶

Status: Not Started Description: Need to establish Maven/Gradle configuration for Java components Notes: Should mirror ingestion engine patterns for consistency

TD-002: CI/CD Pipeline¶

Status: Complete Description: Set up GitHub Actions for automated testing and deployment Implementation: .github/workflows/ci.yml

Pipeline jobs: - [x] lint - Ruff linting and formatting checks - [x] unit-tests - Fast tests with coverage reporting (~30s) - [x] integration-tests - PostgreSQL service container (~2m) - [x] e2e-tests - PostgreSQL + LocalStack service containers (~2m)

Triggers: Push to main, pull requests to main

TD-003: Testing Framework¶

Status: Complete Description: Establish pytest structure for Python, JUnit for Java Progress: Full test pyramid implemented with 1,522 tests passing

Test Type	Count	Percent	Target
Unit	1,290	85%	80%
Integration	182	12%	15%
E2E	50	3%	5%

Coverage: - Unit: Exceptions, middleware, domain entities, services, workflows (S2-S11) - Integration: Repositories (Client, Document, Engagement, Consent, Extraction, Checklist, Workflow, Review, Messaging, Delivery, EFiling, Invoice), S3Service with LocalStack - E2E: Client, Verification, Engagement, Consent, Document, Messaging, Delivery, EFiling, Invoice API endpoints

TD-004: UAT Script Creation¶

Status: Not Started Priority: P1 (Pre-Launch) Description: Create User Acceptance Testing scripts for client validation

Traceability: - [ ] Create requirements traceability matrix (RTM) linking UAT scripts to USER_STORIES.md - [ ] Each UAT test case references specific story ID (e.g., S2-001, S10-003) - [ ] Track coverage percentage against requirements - [ ] Bug tracking with requirement linkage for root cause analysis

UAT Scripts: - [ ] Define UAT scenarios for each sequence (S2-S18) - [ ] Create step-by-step testing scripts with requirement references - [ ] Define expected outcomes and acceptance criteria per story - [ ] Create UAT reporting templates with pass/fail per requirement - [ ] Document rollback procedures for failed UAT

TD-005: Test Data Generator (TDG-001)¶

Status: Complete (2024-12-26) Priority: P1 (Pre-Launch) Description: Comprehensive test data generator with realistic scenarios and document quality variations

User Personas (each with complete document sets): - [x] Individual (Simple): Single W-2, standard deductions, single state - [x] Individual (Heavy Investor): Multiple 1099-DIVs, 1099-Bs, K-1s - [x] Business Owner: Schedule C, 1099-NECs, business expenses, mileage - [x] Sub-Contractor: Multiple 1099-NECs, 1099-Ks, mileage logs - [x] Complex Individual: Multi-state, K-1s, 1098 mortgage - [x] S-Corp Owner: K-1 from S-Corp, W-2 from own company, distributions - [x] Retiree: 1099-R, SSA-1099, investment income

Document Types Generated: - [x] W-2s (single/multiple employers) - [x] 1099 series (DIV, INT, B, NEC, MISC, R, SSA, K) - [x] 1098 (mortgage interest) - [x] K-1s (partnership 1065, S-Corp 1120S) - [x] PDF bank statements - [x] PDF credit card statements - [x] Receipt images (PNG with quality variations) - [x] Mileage logs - [ ] Prior year tax returns (future enhancement) - [ ] ID documents (future enhancement)

Image Quality Variations: - [x] Excellent quality (clean generation) - [x] Medium quality (slight blur) - [x] Poor quality (blur, rotation) - [x] Terrible quality (heavy blur, rotation, JPEG artifacts)

Multi-Batch Scenarios: - [x] Batch assignment based on persona configuration - [x] Delayed documents (K-1s, corrected 1099s) arrive in later batches - [x] 1-4 batch scenarios per persona

Generator Implementation: - [x] CLI tool: python scripts/generate_test_data.py --persona business_owner - [x] Configurable output directory (--output) - [x] Reproducible with seed (--seed 42) - [x] Generate realistic but fake PII (987-65-xxxx SSN range) - [x] 69 unit tests passing

Files: - scripts/generate_test_data.py - CLI entry point - src/testing/data_generator.py - Core generation logic - src/testing/document_renderer.py - PDF/image rendering - src/testing/utils.py - PII generation utilities - src/testing/personas/personas.yaml - Persona configurations - tests/unit/testing/ - Unit tests

TD-006: Placeholder and Assumption Audit¶

Status: In Progress Priority: P1 (Pre-Launch) Description: Review and address "for now", "placeholder", and "TODO" items throughout codebase Plan: See docs/plans/TECH_DEBT_CLEANUP.md

Audit Complete (2024-12-24): Found 89 items, categorized as: - 35 Production Blockers: External service stubs (Email, SMS, Persona, SmartVault, SurePrep, Google) - 12 Development Conveniences: Acceptable placeholders (PDF generation, placeholder citations) - 42 Not Issues: Documentation/expected behavior (SQL placeholders, template variables)

Non-API Fixes Complete (2024-12-24): 4 items fixed without external API credentials: - [x] Consent route client lookup (fetches real client data from ClientRepository) - [x] AI QA citation resolution (resolves document names to actual IDs via DocumentRepository) - [x] EFiling ready checks (checks Form 8879 signatures and document checklist) - [x] Placeholder PDF generation (ReportLab implementation) - [x] Fixed ChecklistRepository abstract method implementation

Production blockers organized into 6 phases: - [x] Phase 0: Audit complete - [x] Phase 0b: Non-API fixes complete (4 items) - [ ] Phase 1: EmailService + SMSService (5-7 hrs) - requires API credentials - [ ] Phase 2: PersonaService (4-5 hrs) - requires API credentials - [ ] Phase 3: SmartVaultService (6-8 hrs) - requires API credentials - [ ] Phase 4: SurePrepService (8-10 hrs) - requires API credentials - [ ] Phase 5: GoogleService (6-8 hrs) - requires API credentials - [ ] Phase 6: Webhook Security (2-3 hrs)

Lesson learned: "Simpler" shortcuts that mask real requirements create hidden bugs. All assumptions should be explicitly documented and validated.

TD-007: Code Audit Findings (AUDIT-001 through AUDIT-008)¶

Status: Documented Priority: P2 (Post-MVP) Audit Date: 2024-12-27 Report: docs/audits/CODE_AUDIT_2024-12-27.md

Overall Rating: 8.5/10 - Production-ready architecture in src/, development-only duplication in local_api.py

Ratings by Area: - Architecture: 9/10 (excellent layering, services → repositories → domain) - Service Centralization: 9/10 (25 services inherit BaseService) - API Client Centralization: 10/10 (single api.ts for all calls) - Connection Pooling: 6/10 (src/ good, local_api.py needs work) - Maintainability: 8/10 (some large files need splitting) - Scalability: 8/10 (async patterns ready, needs pagination for large batches)

Action Items: - [x] AUDIT-001: Document local_api.py as dev-only in ARCHITECTURE.md (15 min) ✅ 2024-12-27 - [ ] AUDIT-002: Add connection pooling to local_api.py if used for extended demos (2 hrs) - [x] AUDIT-003: ~~Split repository files exceeding 1,000 lines~~ - Cancelled (single developer, no merge conflict risk) - [ ] AUDIT-004: Add inline comments to complex workflow logic (2 hrs) - Comments improve Claude's accuracy and speed - [ ] AUDIT-005: Move hardcoded values to environment variables (1 hr) - [ ] AUDIT-006: Consider migrating local_api.py to use src/ modules (8 hrs) - [ ] AUDIT-007: Add request/response logging middleware (2 hrs) - [ ] AUDIT-008: Implement caching layer for read-heavy endpoints (4 hrs)

TD-008: Security Audit Findings (SEC-001 through SEC-016)¶

Status: Documented Priority: P0 (Critical), P1 (High/Medium), P2 (Low) Audit Date: 2024-12-27 Report: docs/audits/SECURITY_AUDIT_2024-12-27.md

Overall Security Rating: 7/10 - Solid foundation, needs hardening before production

Critical (P0 - Fix Before Production): - [ ] SEC-001: XSS via template injection - src/services/template_service.py:276-301 - [ ] SEC-002: dangerouslySetInnerHTML without sanitization - TourOverlay.tsx:187-217 - [ ] SEC-003: Missing security headers - src/api/main.py - [x] SEC-004: Dev API weak auth - Already documented as dev-only per AUDIT-001

High (P0 - Fix Before Production): - [ ] SEC-005: ORDER BY SQL injection risk - base_repository.py:131 - [ ] SEC-006: CORS overly permissive - src/api/main.py:91-98 - [ ] SEC-007: Missing role enforcement on list endpoints - clients.py:66 - [ ] SEC-008: Stripe test keys in plain text - .env:44-45

Medium (P1 - Fix in V1.1): - [ ] SEC-009: No token revocation mechanism - auth.py - [ ] SEC-010: Payment bypass flag needs compliance doc - efiling.py:31-35 - [ ] SEC-011: Search parameter unbounded - clients.py:70 - [ ] SEC-012: Filename not sanitized in S3 path - local_api.py:879 - [ ] SEC-013: Request size limits not enforced - main.py, nginx.conf

Low (P2 - Address When Convenient): - [ ] SEC-014: In-memory rate limiting (needs Redis for prod) - rate_limiting.py:68 - [ ] SEC-015: Error messages may leak info - local_api.py:1513 - [ ] SEC-016: Email query parameter not validated - registration.py:252

TD-009: Compliance Audit Findings (COMP-001 through COMP-022)¶

Status: Documented Priority: P0 (Critical), P1 (High), P2 (Medium) Audit Date: 2024-12-27 Report: docs/audits/COMPLIANCE_AUDIT_2024-12-27.md

Overall Compliance Rating: 70/100 - Strong foundation with critical implementation gaps

Critical (P0 - Fix Before Production): - [ ] COMP-001: AI/cloud processing consent not implemented - Form 7216 violation risk - [ ] COMP-002: Form 7216 consent not enforced at e-filing - efiling_workflow.py - [ ] COMP-003: Authentication events not logged - audit_service.log_auth() never called - [ ] COMP-004: Document/client access not logged - only modifications tracked - [ ] COMP-005: Account lockout not implemented - SEC-005 design only - [ ] COMP-006: Field-level encryption not implemented - ENC-004 design only - [ ] COMP-007: Conflict of interest checks missing - Circular 230 requirement - [ ] COMP-008: Form 2848 (POA) workflow missing - domain model only

High (P1 - Fix in V1.1): - [ ] COMP-009: PTIN expiration not enforced in workflows - [ ] COMP-010: Competency/credential tracking missing - CIR-004 - [ ] COMP-011: MFA not implemented - framework only - [ ] COMP-012: Employee training tracking missing - WISP requirement - [ ] COMP-013: Database immutability not enforced - REVOKE statements commented - [ ] COMP-014: Incident detection/alerting not implemented - [ ] COMP-015: Persona integration in dry-run mode - [ ] COMP-016: Authorization denials not logged to audit

Medium (P2 - Address When Convenient): - [ ] COMP-017: Legal hold not implemented - RET-004 - [ ] COMP-018: Secure deletion not implemented - RET-005 - [ ] COMP-019: Vendor security assessments missing - [ ] COMP-020: Password policy not enforced - [ ] COMP-021: IP/device context not auto-captured - [ ] COMP-022: Consent table schema mismatch

Effort Estimate: 85-120 hours total (P0: 40-60 hrs, P1: 30-40 hrs, P2: 15-20 hrs)

TD-010: Remove Frontend Mock Mode Code¶

Status: Not Started Priority: P3 (Low - Technical Debt) Added: 2024-12-28

Description: Remove all DEV_MODE conditional branches and MOCK_* data arrays from frontend/packages/ui/src/lib/api.ts. Mock mode was disabled on 2024-12-28 in favor of using real API with seeded database.

Scope: - Remove ~30 if (DEV_MODE) { ... } conditional blocks - Remove MOCK_CLIENTS, MOCK_RETURNS, MOCK_DOCUMENTS, MOCK_ANALYSES, MOCK_CHAT_HISTORY arrays (~300 lines) - Remove DEV_MODE constant and related comments

Effort Estimate: 1 hour

P0 Master Priority List (Pre-Production Blockers)¶

Status: Consolidated 2024-12-27 Total Items: 15 (Security: 6, Compliance: 8, Code: 1 conditional) Total Effort: 45-50 hours

Implementation order optimized for dependencies and quick wins:

Phase 1: Quick Wins (2.5 hrs) - Parallelize¶

ID	Issue	Effort	Fix
SEC-003	Missing security headers	1 hr	Add middleware for X-Frame-Options, CSP, HSTS
SEC-005	ORDER BY SQL injection risk	1 hr	Whitelist allowed column names
SEC-006	CORS overly permissive	30 min	Restrict methods/headers in main.py

Phase 2: XSS Fixes (3 hrs) - Parallelize¶

ID	Issue	Effort	Fix
SEC-001	XSS via template injection	1 hr	Use html.escape() or Jinja2 autoescape
SEC-002	dangerouslySetInnerHTML	2 hrs	Add DOMPurify or refactor to React components

Phase 3: Access Control (5 hrs) - Sequential¶

ID	Issue	Effort	Fix
SEC-007	Missing role enforcement	2 hrs	Filter list endpoints by user role
COMP-005	Account lockout missing	3 hrs	Add lockout fields, check before auth

Phase 4: Audit Logging (5 hrs) - Sequential after Phase 3¶

ID	Issue	Effort	Fix
COMP-003	Auth events not logged	2 hrs	Call audit_service.log_auth() in auth routes
COMP-004	Access not logged	3 hrs	Add log_access() to all GET endpoints

ID	Issue	Effort	Fix
COMP-001	AI processing consent missing	4 hrs	Add USE_AI_PROCESSING, check in BedrockService
COMP-002	E-filing consent not enforced	2 hrs	Validate Form 7216 before mark_ready_for_filing

Phase 6: New Workflows (16 hrs) - Parallelize¶

ID	Issue	Effort	Fix
COMP-007	COI checks missing	8 hrs	COI table, check workflow, API, audit log
COMP-008	Form 2848 POA missing	8 hrs	Form generation, signature, access control

Phase 7: Data Protection (8 hrs)¶

ID	Issue	Effort	Fix
COMP-006	Field encryption missing	8 hrs	encryption_service.py with pgcrypto wrapper

Conditional¶

ID	Issue	Effort	Condition
AUDIT-002	local_api.py connection pooling	2 hrs	Only if used for extended demos

Delegation Strategy¶

Phases 1-2: Can parallelize entirely (quick wins + XSS)
Phases 3-4: Must sequence (access control before audit logging)
Phase 5: Must sequence (consent type before e-filing check)
Phase 6: Can parallelize (COI and POA are independent)
Phase 7: Independent (can run anytime)

Expedited Analysis Pricing (PRICE-001)¶

Status: Backlog Priority: P1 (Revenue Feature) Target: Post-V1 Launch

Business Model¶

Tiered document analysis with freemium expedited processing:

Processing Tier	Timing	Cost
Batch (Default)	Overnight	Included
Expedited	Immediate (~30 sec/doc)	Free quota, then $1.50/doc

Freemium Model: - All clients receive 25 free expedited analyses per month - After quota exhausted: $1.50 per document OR wait for overnight batch - Counter resets on 1st of each month

User Experience¶

Document Upload Flow: 1. Documents uploaded show status: "Pending Analysis" 2. Banner displays: "3 documents pending. Overnight batch included, or analyze now." 3. Show quota: "15 of 25 expedited analyses remaining this month" 4. When quota exhausted, button changes to "Analyze Now ($1.50)" or "Queue for Overnight"

Recording Should Demonstrate: - Instant expedited analysis (within quota) - "Queued for overnight" state (quota exhausted or user choice) - Quota counter display

Implementation¶

Database Changes: - Add expedited_analyses_used counter to client/account table (resets monthly) - Add expedited_analyses_limit field (default: 25) - Add analysis_status enum to documents: pending → queued → processing → complete

Backend: - Quota check before expedited processing - Airflow DAG for overnight batch processing - Stripe integration for overage billing

Frontend: - Status badges on documents - Quota counter in header/sidebar - Expedited vs batch choice dialog - Overage payment confirmation

Files to Create: - src/domain/analysis_quota.py - src/repositories/analysis_quota_repository.py - src/workflows/document_analysis/batch_processor.py - dags/overnight_analysis_dag.py

Margin Analysis¶

At current AI costs (~$0.05/doc for analysis): - Expedited fee: $1.50/doc - Cost: $0.05/doc - Margin: 97%

Free quota (25/month) costs ~$1.25/month per client - acceptable customer acquisition cost.

Duplicate Document Detection (DUP-001)¶

Status: ✅ Implemented (2024-12-27) Priority: P2 (Data Quality) Target: V1.1

Problem¶

Users may accidentally upload the same document multiple times, leading to: - Duplicate data in AI analysis - Confusion about which version is current - Wasted storage and processing

Solution¶

Detect duplicates on upload using file hash at two levels:

On Upload: Calculate SHA-256 hash of file content
Check Same Client: Query documents for this client with matching hash
Check Cross-Client: Query all documents with matching hash (different client)
Warn/Error: Show appropriate message based on match type

User Experience¶

Same Client Duplicate:

⚠️ Duplicate Document

"W2_Global_2023.pdf" matches a document uploaded on Dec 15, 2024.

[View Original]  [Cancel]

Note: No "Upload Anyway" - identical content has no benefit to re-upload.

Cross-Client Match (likely wrong client selected):

🚨 Document Belongs to Another Client

This file is already linked to: John Smith
(uploaded Dec 10, 2024)

Did you select the wrong client?

[View in John Smith]  [Cancel]

Note: No auto-move - moving documents has cascading implications for both clients' returns.

Implementation (Completed)¶

Database Changes: - Added file_hash column to documents table (VARCHAR(64)) - Added index: idx_documents_file_hash

API Changes: - Calculate SHA-256 hash on upload - Single query to check for existing hash, returns client_id - Returns 409 Conflict with duplicate info if found: - error: "duplicate_document" - is_same_client: boolean - existing_document: { id, client_id, client_name, filename, uploaded_at }

Files Modified: - scripts/local_api.py - Hash calculation and duplicate check - scripts/bootstrap.py - file_hash column and index - frontend/.../DocumentUploadPage.tsx - Duplicate dialog with View Original / Cancel - frontend/.../AnalysisDashboardPage.tsx - Inline drag-drop upload with same duplicate detection

Tests Added: - tests/bdd/features/duplicate_detection.feature - 9 BDD scenarios - tests/bdd/step_defs/test_duplicate_detection.py - Step definitions - frontend/tests/features/duplicate-detection.feature - 12 Gherkin scenarios - frontend/tests/duplicate-detection.spec.ts - Playwright test stubs

Name Mismatch Detection (DUP-002)¶

Status: Backlog Priority: P3 (Data Quality) Target: V1.2

Problem¶

Document uploaded to wrong client - e.g., W-2 for "John Smith" uploaded to client "Jane Doe".

Solution¶

Post-processing check after document extraction: 1. Extract name from document metadata (W-2, 1099, etc.) 2. Fuzzy match against client name 3. If mismatch, flag for review

Considerations¶

Requires OCR/extraction to complete first (not instant like hash check)
Fuzzy matching needed ("John Smith" vs "John A. Smith" vs "J. Smith")
Some docs have no name (receipts, bank statements) - skip check
Joint returns - both spouse names are valid matches
Threshold for fuzzy match confidence

User Experience¶

After document processing completes:

⚠️ Name Mismatch

This W-2 shows "John Smith" but client is "Jane Doe".

[View Document]  [Confirm Correct Client]  [Move to Different Client]

Concurrent Edit Handling (CONC-001)¶

Status: Planning Complete Priority: P2 (Data Integrity) Target: V1.1 Plan: docs/plans/OPTIMISTIC_LOCKING.md

Problem¶

Two users editing the same record simultaneously results in silent data loss (Last Write Wins). No conflict detection or user notification.

Solution: Optimistic Locking¶

Add version field to mutable entities. On update, verify version matches; if stale, return 409 Conflict with current data for user resolution.

Scope¶

Entity	Risk	Implementation
Clients	High	Version column, API check, conflict dialog
Tax Returns	High	Version column, API check, conflict dialog
Documents	Medium	Version column, API check, conflict dialog
Users	Low	Version column, API check

Implementation Summary¶

Component	Changes	Effort
Database	Add `version` column to 4 tables	30 min
API	Version in responses, check on updates, 409 handling	2 hrs
Frontend	State tracking, ConflictDialog component	2 hrs
Tests	Unit (5), Integration (3), E2E (3), BDD (3)	3 hrs
Total		8-10 hrs

Delegation Strategy¶

Haiku: Database migration, API endpoint updates, unit tests
Sonnet: Frontend state management, conflict dialog, integration/E2E tests

Test Cases¶

Update with correct version succeeds, version increments
Update with stale version returns 409 Conflict
Missing version returns 400 Bad Request
Conflict response includes current data
Conflict dialog appears and functions correctly

Acceptance Criteria¶

All mutable entities have version column
All update endpoints require and verify version
409 Conflict returned with current data on mismatch
Frontend displays conflict resolution dialog
User can discard changes or retry (overwrite)
All tests passing

Field-Level Merging (CONC-002)¶

Status: Planned Priority: P3 (UX Enhancement) Target: V1.2 Depends on: CONC-001 Plan: docs/plans/OPTIMISTIC_LOCKING.md Section 10

Problem¶

Record-level locking (CONC-001) forces all-or-nothing conflict resolution. If User A changes phone and User B changes email, they shouldn't conflict.

Solution¶

Track which fields were changed. Auto-merge non-overlapping changes. Only show conflict dialog for same-field edits.

Example - Auto-merge: - User A changes phone - User B changes email - Result: Both saved, no conflict

Example - Partial conflict: - User A changes phone + email - User B changes email - Result: Phone auto-merged, email conflict shown

Effort Estimate¶

4-6 hours (incremental on CONC-001)

Special Conflict Scenarios (CONC-003)¶

Status: Planned Priority: P3 (Data Integrity) Target: V1.2 Depends on: CONC-001 Plan: docs/plans/OPTIMISTIC_LOCKING.md Section 10

Scenarios¶

Delete vs Update: User A deletes record while User B is editing. → Block delete or warn "record was deleted"

Status Race: Two users change return status simultaneously. → State machine validates transitions

Assignment Collision: Two preparers claim same return. → "Already assigned to X" message

Bulk vs Single: Bulk update while individual edit in progress. → Per-record version check, report partial failures

Parent-Child Constraints: Delete client with active returns. → "Cannot delete: has N active returns"

Effort Estimate¶

6-8 hours

Client Actions (UI-001, UI-002)¶

UI-001: Send Portal Invite¶

Status: Backlog Priority: P2 (Client Experience) Target: V1.1

Send invitation email to client with portal access link. Triggers welcome email with secure login instructions.

UI-002: Create Engagement Letter¶

Status: Backlog Priority: P2 (Workflow) Target: V1.1

Generate engagement letter from template, pre-filled with client info. Integrates with S3-001/S3-002 (Engagement Letter Generation and E-Signature).

AI-Support Ticketing System (SUP-001)¶

Status: Backlog Priority: P2 (Client Experience) Target: V1.2

Problem¶

Client questions and issues currently handled via email/phone without centralized tracking. No AI assistance in triaging or responding to common questions.

Solution¶

Ticketing system with AI-powered triage and response assistance:

Ticket Intake: Clients submit questions via portal, email, or phone (staff creates ticket)
AI Triage: Auto-classify ticket type, priority, and route to appropriate staff
AI Draft Response: Generate suggested response for common questions
Staff Review: Staff reviews AI draft, edits if needed, and sends
Resolution Tracking: Track time to resolution, client satisfaction

Features¶

Ticket Types: - Document request (missing W-2, need copy of return) - Status inquiry (where is my refund, when will return be filed) - Tax question (can I deduct X, how does Y work) - Technical support (portal access, password reset) - Billing inquiry (invoice questions, payment issues) - Appointment request (schedule call, meeting)

AI Capabilities: - Auto-categorize incoming tickets by type and urgency - Suggest priority based on content analysis - Draft responses using client context (return status, documents, prior communications) - Flag tickets requiring senior staff attention - Detect sentiment (frustrated, urgent, routine)

Staff Interface: - Unified inbox with AI-suggested priorities - One-click approve AI draft or edit - Internal notes and escalation - Response templates with merge fields - Time tracking per ticket

Client Interface: - Submit new ticket from portal - View ticket status and history - Receive notifications on updates - Rate satisfaction after resolution

Implementation¶

Database: - tickets table (id, client_id, type, priority, status, created_at, resolved_at) - ticket_messages table (id, ticket_id, sender_type, content, ai_draft, created_at) - ticket_templates table (id, type, subject, body)

Services: - src/services/ticket_service.py - Core ticketing logic - src/services/ticket_ai_service.py - AI triage and response generation

API: - src/api/routes/tickets.py - CRUD endpoints - src/api/schemas/ticket_schemas.py - Request/response schemas

Frontend: - frontend/apps/staff-app/src/pages/TicketsPage.tsx - Staff ticket queue - frontend/apps/staff-app/src/components/tickets/TicketDetail.tsx - Individual ticket view - frontend/apps/client-portal/src/pages/SupportPage.tsx - Client ticket submission

AI Prompts: - src/prompts/tickets/triage.txt - Classify and prioritize - src/prompts/tickets/draft_response.txt - Generate response draft

Effort Estimate¶

Backend: 16-20 hours
Frontend: 12-16 hours
AI Integration: 8-12 hours
Testing: 8-10 hours
Total: 44-58 hours

Metrics¶

Average response time
First-contact resolution rate
AI draft acceptance rate
Client satisfaction scores
Tickets per client per season

Future Phases¶

PHASE-BKP: Bookkeeping Module¶

Status: Requirements Drafted Priority: Post-MVP Requirements: bookkeeping_requirements.md

Phased implementation: - Phase 1: Bank statement upload, AI categorization, QuickBooks export - Phase 2: Reconciliation, recurring transaction detection - Phase 3: Full bookkeeping (chart of accounts, P&L, Balance Sheet, two-way sync)

Pending Client Input: - Service level (tax-ready categorization vs full bookkeeping) - Target clients (business entities only vs all clients) - Pricing model (monthly retainer vs per-transaction)

AI Cost Optimization (SaaS Profitability)¶

Critical for SaaS business model. See COST_DETAIL.md for full analysis.

OPT-001: Metadata Caching¶

Status: ✅ Complete (2025-12-28) Priority: P0 (Core to SaaS margins) Estimated Impact: 60-75% token reduction, $0.24/return savings

Problem: Every AI query re-reads source documents. A W-2 referenced 10 times costs 10× the tokens.

Solution: Extract once, cache as markdown, reference forever.

Implementation: - [x] Create metadata MD file on first document scan (<350 lines each) - [x] Per-document: {client_id}/{return_year}/{doc_id}_metadata.md - [x] Per-return: {client_id}/{return_year}/return_summary.md - [x] Store in S3 alongside original documents - [x] Index metadata location in Aurora (document_metadata table) - [x] AI reads cache first; only re-scan if stale or confidence < 90% - [x] Refresh trigger when document updated (mark_document_stale)

Metadata file contents: - Document type, source, upload date, confidence score - Extracted values (wages, withholding, etc.) in structured tables - AI notes (prior year comparison, anomalies) - Flags and questions

Savings: | Metric | Without Cache | With Cache | |--------|---------------|------------| | Tokens/return | 126,500 | ~50,000 | | AI cost/return | $0.40 | $0.16 |

OPT-002: Batch API Default¶

Status: Not Started Priority: P0 (Core to SaaS margins) Estimated Impact: 50-60% cheaper on batch-eligible tasks

Problem: Interactive API calls cost more than batch.

Solution: Default to batch processing with opt-in for real-time.

Implementation: - [ ] UX: "I'll have results ready tomorrow morning. [Start Live Session]" - [ ] Queue document processing overnight via Airflow - [ ] Pre-generate worksheets for morning review - [ ] Only Preparer Q&A requires real-time - [ ] Track batch vs interactive usage for cost analysis

Batch-eligible tasks: - Document classification and extraction - Prior year comparison - Missing document detection - Worksheet generation - Rejection analysis - Tax reminders

Savings: 80% batch acceptance → $0.40 → $0.28/return

OPT-003: Model Delegation Strategy¶

Status: Not Started Priority: P1 Estimated Impact: 38% AI cost reduction

Solution: Use cheapest model capable for each task.

SONNET (Orchestrator) → HAIKU (60% - extraction)
                      → SONNET (35% - analysis)
                      → OPUS (5% - expert review)

Implementation: - [ ] Haiku for document extraction and classification - [ ] Sonnet for Q&A, comparisons, worksheets - [ ] Opus only for tax code interpretation, audit risk, complex scenarios - [ ] Auto-escalation when confidence < 80% or complex entity types - [ ] User option: "Ask the expert" to force Opus

Savings: All-Sonnet ($0.64) → Delegation ($0.40) = 38% reduction

OPT-004: Combined Optimization Target¶

Priority: P0 Target: $0.12/return AI cost (vs $0.40 baseline)

Combined impact: 1. Batch processing (80%): $0.40 → $0.28 2. Metadata caching (75%): $0.28 → $0.12 3. Model delegation: Further optimization within each tier

At scale (10,000 returns): - Baseline: $4,000/year AI cost - Optimized: $1,200/year AI cost - Annual savings: $2,800

OPT-005: Prompt Compression¶

Status: Not Started Priority: P2 (Quick Win) Estimated Impact: 10-20% additional token reduction

Problem: System prompts, tax code references, and instructions repeat on every call.

Solution: Use prompt compression tools (LLMLingua) to reduce prompt size while preserving meaning.

Implementation: - [ ] Evaluate LLMLingua for system prompt compression - [ ] Compress static instructions and tax code references - [ ] Benchmark quality vs compression ratio - [ ] A/B test compressed vs full prompts

OPT-006: Output Token Limits¶

Status: Not Started Priority: P2 (Quick Win) Estimated Impact: 5-15% output token reduction

Problem: AI generates verbose explanations when structured data is sufficient.

Solution: Force shorter, structured responses for extraction and classification tasks.

Implementation: - [ ] Set max_tokens limits per task type - [ ] Use structured JSON outputs for extraction - [ ] Reserve verbose mode for Q&A only - [ ] Track output token usage by task type

OPT-007: Semantic Caching¶

Status: Not Started Priority: P2 Estimated Impact: 15-30% reduction on repeated queries

Problem: Similar questions hit the AI repeatedly. "How do I report crypto?" and "Where do cryptocurrency gains go?" are the same question.

Solution: Cache responses by meaning, not exact match. Return cached answer for semantically similar queries.

Implementation: - [ ] Implement vector embeddings for query similarity - [ ] Set similarity threshold for cache hits - [ ] Focus on client Q&A and preparer questions - [ ] Track cache hit rate and quality

OPT-008: RAG Optimization¶

Status: Not Started Priority: P2 Estimated Impact: 20-30% context token reduction

Problem: Sending entire prior-year returns when only specific sections are relevant.

Solution: Retrieve only relevant document sections based on the question.

Implementation: - [ ] Chunk documents into semantic sections - [ ] Index chunks with vector embeddings - [ ] Retrieve top-k relevant chunks per query - [ ] Benchmark quality vs full-document approach

OPT-009: Confidence-Based Escalation¶

Status: Not Started Priority: P3 Estimated Impact: 10-20% model cost reduction

Problem: Pre-routing tasks to specific models may over-allocate expensive models.

Solution: Let Haiku attempt everything first, escalate dynamically based on confidence scores.

Implementation: - [ ] Define confidence thresholds per task type - [ ] Implement escalation pipeline (Haiku → Sonnet → Opus) - [ ] Track escalation rates and quality - [ ] Tune thresholds based on error rates

OPT-010: Fine-Tuning (Future)¶

Status: Not Started Priority: P4 (Volume-dependent) Estimated Impact: 30-50% cost reduction on fine-tuned tasks

Problem: Using general-purpose models for repetitive, domain-specific tasks.

Solution: Train smaller models on our specific tasks (document classification, common Q&A).

Implementation: - [ ] Collect 6+ months of production data - [ ] Identify high-volume, repetitive tasks - [ ] Fine-tune Haiku or open-source model (LoRA/QLoRA) - [ ] A/B test fine-tuned vs general model

OPT-011: Knowledge Distillation (Future)¶

Status: Not Started Priority: P4 (Volume-dependent) Estimated Impact: 50-85% cost reduction on distilled tasks

Problem: Want Opus-quality responses at Haiku prices.

Solution: Use Opus to generate training data, teach Haiku to mimic Opus responses.

Implementation: - [ ] Generate "gold standard" responses with Opus - [ ] Create training dataset from Opus outputs - [ ] Train Haiku to reproduce Opus-quality responses - [ ] Deploy distilled model for production use

Design Decisions Log¶

ID	Date	Decision	Rationale
DD-001	2024-12-22	Service centralization as core pattern	Single point of access for all external services - easier maintenance, testing, audit
DD-002	2024-12-22	Mixed Java/Python architecture	Java for performance-critical processing, Python for API/orchestration/AI
DD-003	2024-12-22	Shared config.yaml for both languages	Single source of truth, environment variable substitution
DD-004	2024-12-22	Aurora-only starting architecture	Aurora handles all data including 7+ years retention (via table partitioning). Tiered storage (Athena/Snowflake) adds complexity not justified at small scale. S3 for document storage only, not as query layer. Add analytics tier only when Aurora partitioning insufficient. Both Aurora and Snowflake meet security/compliance requirements for tax data.
DD-005	2024-12-23	Self-hosted Airflow orchestration	Use self-hosted Airflow on EC2 t3.medium (~$23/mo reserved) instead of MWAA ($360+/mo) or Step Functions. Full control, Python DAGs, built-in UI/monitoring. Acceptable maintenance overhead for small practice.
DD-006	2024-12-23	Bookkeeping as separate requirements document	Bookkeeping has different cadence (monthly vs annual), different workflow, and could serve non-tax clients. Separate document allows independent prioritization and phasing. Shares infrastructure with tax system.
DD-007	2024-12-23	Bookkeeping phased approach	Phase 1: tax-ready categorization + QuickBooks export. Phase 2: reconciliation. Phase 3: full bookkeeping. Start light, design for full. QuickBooks is system of record initially.
DD-008	2024-12-23	V1 integration strategy: integrate, don't replace	Integrate with SmartVault (client portal), SurePrep (OCR/extraction), and UltraTax (via SurePrep CS Connect). Don't replace industry standard tools in V1. Future versions may replace SurePrep to capture per-return fees, but UltraTax lacks API (blocker).
DD-009	2024-12-23	Dual integration pattern: Services + Skills	Services handle API calls (how to call). Skills provide AI context (how to understand). SmartVault, SurePrep, UltraTax each get skills for AI to interpret their data. UltraTax has skill but no service (no API).
DD-010	2024-12-23	Multi-tenant SaaS: Separate databases per tenant	Deploy as SaaS with separate database per tenant firm within one Aurora cluster. Strongest isolation for tax compliance, shared tiered pricing, easy to migrate growing tenants to dedicated clusters. Requires tenant routing middleware and dynamic connection management.
DD-011	2024-12-24	Frontend: React + Vite two-app architecture	Two separate apps (client-portal, staff-app) with shared component library (@tax-practice/ui). React 18, Vite, TypeScript, Tailwind, shadcn/ui, React Query, Zustand. Production-grade, sellable stack. HTMX considered but React chosen for richer UX and market perception.

Completed Items¶

DOC-001: Pre-Implementation Documentation¶

Completed: 2024-12-23

DATABASE_SCHEMA.sql - Complete 35-table schema with 50 enums
DATA_MODEL.md - Logical data model with ER diagrams
API_SPECIFICATION.md - Full REST API contract
INTEGRATION_CONTRACTS.md - External service integrations
SECURITY_DESIGN.md - Security architecture and controls
PROCESS_FLOWS.md - State machines and workflows
USER_STORIES.md - 82 prioritized user stories (77 MVP, 5 post-MVP)

DOC-002: Documentation Reconciliation¶

Completed: 2024-12-23

All pre-implementation specifications validated and reconciled: - Enum values aligned across all documents - Missing endpoints added to API specification - ER diagram updated with all 35 entities - Webhook handlers aligned with integration contracts - Cross-references added between documents

Tax Reference System¶

Tax code reference system created 2024-12-27. Serves both Claude Code /tax skill and Bedrock chatbot (RAG).

Structure: docs/tax-reference/ with federal/, states/, common/ subdirectories organized by tax year.

Maintenance Model: Tax staff direct updates via conversation with Claude. Claude makes the actual file edits. No direct staff editing of markdown files.

TAX-REF-001: Senior Tax Administrator Role (RBAC)¶

Status: Backlog Priority: P2 Dependency: RBAC system implementation

Add RBAC role "Senior Tax Administrator" with permission to direct tax reference updates: - Can instruct Claude to update tax brackets, thresholds, rules - Can approve changes before Claude commits them - Audit log of all tax reference modifications - Year-over-year update workflow (annual review process)

TAX-REF-002: Firm-Specific Tax Skills/Overrides¶

Status: Backlog Priority: P2 Dependency: TAX-REF-001

Allow firms to customize tax reference with firm-specific guidance: - Override default rules with firm interpretations - Add firm-specific notes/cautions on tax positions - Custom worksheets and checklists per firm - Inheritance model: firm overrides layer on top of base tax reference - Storage: docs/tax-reference/overrides/{firm_id}/ or similar

TAX-REF-003: Bedrock Knowledge Base Integration¶

Status: Backlog Priority: P1 Dependency: BedrockService implementation

Integrate tax reference files with client-facing chatbot via hybrid approach:

Bedrock Knowledge Base (RAG): - Index docs/tax-reference/ into Bedrock Knowledge Base - Automatic chunking and embedding of markdown files - Semantic search for open-ended questions ("tell me about NY taxes") - Re-index on file updates (triggered by Claude commits)

Direct File Injection: - BedrockService.get_tax_context(jurisdiction, topic, year) method - Loads specific markdown file for deterministic context - More token-efficient for specific lookups ("NYC MFJ rate") - Routing logic maps question intent to file path

Implementation Steps: - Create Bedrock Knowledge Base with S3 data source pointing to docs/tax-reference/ - Add sync trigger when tax reference files are updated - Implement get_tax_context() in BedrockService - Add intent detection to route between KB search vs direct injection - Include source citations in chatbot responses (file path + section)

Benefits: - Single source of truth (same files serve Claude Code skill and chatbot) - Granular file structure enables precise context injection - Citations provide audit trail for tax advice

TAX-REF-004: Per-Client Tax Context Metadata¶

Status: Backlog Priority: P1 Dependency: AI Analysis workflow, TAX-REF-003

Auto-generated per-client tax context that primes the chatbot with relevant jurisdictions, skills, and flags.

File Structure:

s3://tax-practice-documents/{client_id}/{return_year}/
├── tax_context.md           # Current state (~50 lines, always loaded)
├── tax_context_pending.md   # Chatbot-flagged updates awaiting preparer review
└── tax_context_history.md   # Full changelog/audit trail (on-demand)

tax_context.md Contents: - Jurisdictions (states, localities detected from documents) - Tax skills required (references to docs/tax-reference/ files) - Flags (part-year, multi-state, audit risks, special situations) - Document-derived values (W-2 states, 1099 sources, property locations)

Creation - Part of AI Analysis (automatic, not user-triggered): - Analysis prompt includes jurisdiction detection and skill mapping - Analysis output structure includes tax_context section - Workflow writes tax_context.md as required output - No analysis is "complete" without tax context file

Updates - Delta Detection: - New document uploaded → re-analyze, update if jurisdictions/flags change - Chatbot conversation reveals new info → flag in tax_context_pending.md - Preparer approves pending → applied to tax_context.md, logged to history

Chatbot Flagging: - Chatbot detects tax-relevant statements (new state, life events, corrections) - Creates entry in tax_context_pending.md with suggested changes - Pending items surface in preparer queue with badge count - Preparer approves/rejects/modifies before changes apply - Chatbot cannot directly modify tax_context.md (audit trail integrity)

Version History (tax_context_history.md): - Every change logged with: timestamp, trigger, source (doc ID or conversation ID), actor, diff - S3 versioning enabled as backup - Supports dispute resolution: "On Dec 30 you told us X, here's the conversation"

Chatbot Loading Strategy: - tax_context.md: Always injected as system context - tax_context_pending.md: Loaded when preparer is in session - tax_context_history.md: On-demand ("What did I say about my move date?")

Tiered Context Loading: - Layer 1: tax_context.md (always, ~50 lines) - Layer 2: Specific tax skill files from docs/tax-reference/ (on-demand based on question) - Layer 3: Full Bedrock KB search (if question exceeds listed skills)

Templates: See docs/tax-reference/templates/ for file structure definitions.

Impact Analysis (2024-12-27):

Files to Modify (10): - src/workflows/analysis/preliminary_analysis_workflow.py - Add tax context generation after analysis - src/workflows/documents/classification_workflow.py - Add delta detection after classification - src/services/chat_service.py - Load tax context into chatbot, add flagging logic - src/workflows/review/ai_qa_workflow.py - Include tax context in Q&A prompts - frontend/apps/staff-app/src/pages/ReviewPage.tsx - Add pending review section - frontend/apps/staff-app/src/pages/ClientDetailPage.tsx - Add badge for pending updates - frontend/apps/staff-app/src/components/chat/ChatDrawer.tsx - Show flagged items indicator - src/services/audit_service.py - Log tax context events

New Files to Create (11): - src/services/tax_context_service.py - Core service: generate, update, manage tax context files - src/prompts/tax_context/generate_initial.txt - Prompt for initial tax context generation - src/prompts/tax_context/detect_delta.txt - Prompt for comparing new docs against existing context - src/prompts/tax_context/flag_statement.txt - Prompt for chatbot to detect tax-relevant statements - src/api/routes/tax_context.py - CRUD endpoints for tax context - src/api/schemas/tax_context_schemas.py - Request/response schemas - frontend/apps/staff-app/src/components/tax-context/PendingReviewPanel.tsx - Review/approve UI - frontend/apps/staff-app/src/components/tax-context/TaxContextViewer.tsx - Display current context - tests/unit/services/test_tax_context_service.py - tests/integration/test_tax_context_workflow.py - tests/e2e/test_tax_context_ui.py

No Changes Needed: - src/services/s3_service.py - Existing methods sufficient for read/write

Effort Estimate (Claude-only development with human guidance): - Backend: 15-20 hours of review/approval time - Frontend: 8-12 hours of review/approval time - Testing: 8-10 hours of review/approval time - Documentation: 2-3 hours of review/approval time - Total: 33-45 hours of human time (~1-1.5 weeks) - Note: Generic industry estimate for human developer would be 75-98 hours

Critical Path: 1. TaxContextService creation (foundational) 2. Prompt templates (generate_initial, detect_delta, flag_statement) 3. Integration into preliminary_analysis_workflow (initial generation) 4. Integration into classification_workflow (delta detection) 5. ChatService integration (context loading + flagging) 6. API routes creation 7. UI components (PendingReviewPanel, TaxContextViewer)

Risk Areas: - Delta detection accuracy (false positives/negatives) - mitigate with prompt tuning - Chatbot flagging noise - add confidence thresholds, allow preparer to disable - S3 file conflicts on concurrent updates - use S3 versioning, queue-based updates

Notes¶

Tax requirements: client_facing_docs/tax_practice_ai_requirements.md
Bookkeeping requirements: bookkeeping_requirements.md
User stories: USER_STORIES.md (82 stories in 18 sequences)
Development timeline target: January 2027 filing season (per requirements Section 11)
Critical path items have lead times (vendor selection, account setup) - see Section 11.4 of requirements

Document Inventory¶

Category	Document	Purpose
Requirements	tax_practice_ai_requirements.md	Full business requirements
Requirements	bookkeeping_requirements.md	Bookkeeping module (post-MVP)
Requirements	MIGRATION_REQUIREMENTS.md	Data migration (pre-launch)
Planning	USER_STORIES.md	87 prioritized implementation stories (4 in S0, 10 in S2)
Planning	backlog.md	Priority tracking and decisions
Technical	ARCHITECTURE.md	System design and patterns
Technical	DATABASE_SCHEMA.sql	PostgreSQL schema definition
Technical	DATA_MODEL.md	Logical data model
Technical	API_SPECIFICATION.md	REST API contract
Technical	INTEGRATION_CONTRACTS.md	External service integrations
Technical	SECURITY_DESIGN.md	Security controls
Technical	PROCESS_FLOWS.md	State machines and workflows
Operations	RUNBOOK.md	Operational procedures

Tax Practice AI - Backlog¶

V1 Back-Office AI Companion (Tax Season 2025)¶

V1 Scope¶

V1 Documentation¶

V1 Testing¶

V1 Philosophy¶

Client Decisions (Resolved)¶

TAX-001: Tax Software Selection¶

TAX-002: Volume Projections¶

TAX-003: Entity Type Mix¶

TAX-004: State Coverage¶

Phase 0: Data Migration (Pre-Launch Prerequisite)¶

MIG-001: Client Data Import Tool¶

MIG-002: Bulk Document Import Tool¶

MIG-003: Historical Return Data Import¶

MIG-004: Migration Validation & Rollback¶

Phase 1: Foundation (Complete)¶

FOUND-001: Project Structure Setup¶

FOUND-002: Service Centralization Framework¶

FOUND-003: Snowflake Service Implementation¶

FOUND-004: Aurora Service Implementation¶

Phase 2-7: Client, Document & Tax Preparation (Complete)¶

Sequence 2: Client Identity (Complete)¶

Sequence 3: Engagement (Complete)¶

Sequence 4: Document Management (Complete)¶

Sequence 5: AI Analysis (Complete)¶

Sequence 6: Tax Preparation Workflow (Complete)¶

Sequence 7: Preparer & Reviewer Interface (Complete)¶

Sequence 8: Client Communication (Complete)¶

Sequence 9: Client Delivery (Complete)¶

Sequence 10: E-Filing Status Tracking (Complete)¶

Sequence 11: Billing & Payments (Complete)¶

Sequence 12: Estimated Tax Management (Complete - 3 of 4 stories)¶

Sequence 13: AI Chat (Complete)¶

Technical Debt¶

TD-001: Java Build Configuration¶

TD-002: CI/CD Pipeline¶

TD-003: Testing Framework¶

TD-004: UAT Script Creation¶

TD-005: Test Data Generator (TDG-001)¶

TD-006: Placeholder and Assumption Audit¶

TD-007: Code Audit Findings (AUDIT-001 through AUDIT-008)¶

TD-008: Security Audit Findings (SEC-001 through SEC-016)¶

TD-009: Compliance Audit Findings (COMP-001 through COMP-022)¶

TD-010: Remove Frontend Mock Mode Code¶

P0 Master Priority List (Pre-Production Blockers)¶

Phase 1: Quick Wins (2.5 hrs) - Parallelize¶

Phase 2: XSS Fixes (3 hrs) - Parallelize¶

Phase 3: Access Control (5 hrs) - Sequential¶

Phase 4: Audit Logging (5 hrs) - Sequential after Phase 3¶

Phase 5: Consent System (6 hrs) - Sequential¶

Phase 6: New Workflows (16 hrs) - Parallelize¶

Phase 7: Data Protection (8 hrs)¶

Conditional¶

Delegation Strategy¶

Expedited Analysis Pricing (PRICE-001)¶

Business Model¶

User Experience¶

Implementation¶

Margin Analysis¶

Duplicate Document Detection (DUP-001)¶

Problem¶

Solution¶

User Experience¶

Implementation (Completed)¶

Name Mismatch Detection (DUP-002)¶

Problem¶

Solution¶

Considerations¶

User Experience¶

Concurrent Edit Handling (CONC-001)¶

Problem¶

Solution: Optimistic Locking¶

Scope¶

Implementation Summary¶

Delegation Strategy¶

Test Cases¶

Acceptance Criteria¶

Field-Level Merging (CONC-002)¶

Problem¶