{"openapi":"3.1.0","info":{"title":"DocParser API","description":"\nGPU-accelerated PDF parsing service with two engines and distributed task processing.\n\n> **Status: Closed Beta** — This service is currently in testing. API access\n> is restricted to internal use only. Public registration is not yet available.\n\n## Engines\n\n| Engine | Strengths |\n|--------|-----------|\n| **struct** | Document layout analysis, formula / equation recognition, academic paper quality |\n| **polyglot** | Chinese and multilingual support, robust text extraction across writing systems |\n\n## Workflow\n\n1. `POST /parse` — upload a PDF, receive a `task_id` immediately (202 Accepted)\n2. `GET /status/{task_id}` — poll until `status` is `success` or `failure`\n3. `GET /result/{task_id}` — retrieve the parsed Markdown\n\n## Authentication\n\nProtected endpoints require an `X-API-Key` header. Create keys via the\n[Admin Panel](http://127.0.0.1:8888/admin) on the GPU machine (only accessible\nfrom localhost).\n\nPublic endpoints (no key required): `/health`, `/gpustatus`.\n\n## Architecture\n\nUser → Nginx (SSL) → SSH tunnel → FastAPI → Redis → Celery Workers → GPU Engines\n","contact":{"name":"DocParser Admin","url":"https://docparser.deconbear.cn/"},"license":{"name":"MIT"},"version":"0.0.1"},"servers":[{"url":"https://docparser.deconbear.cn","description":"Testing"}],"paths":{"/admin/keys":{"get":{"tags":["admin"],"summary":"Get Keys","operationId":"get_keys_admin_keys_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}},"post":{"tags":["admin"],"summary":"Post Key","operationId":"post_key_admin_keys_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/CreateKeyRequest"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/admin/keys/{key_id}/revoke":{"put":{"tags":["admin"],"summary":"Put Revoke","operationId":"put_revoke_admin_keys__key_id__revoke_put","parameters":[{"name":"key_id","in":"path","required":true,"schema":{"type":"string","title":"Key Id"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/admin/keys/{key_id}/activate":{"put":{"tags":["admin"],"summary":"Put Activate","operationId":"put_activate_admin_keys__key_id__activate_put","parameters":[{"name":"key_id","in":"path","required":true,"schema":{"type":"string","title":"Key Id"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/admin/keys/{key_id}":{"delete":{"tags":["admin"],"summary":"Delete Key Endpoint","operationId":"delete_key_endpoint_admin_keys__key_id__delete","parameters":[{"name":"key_id","in":"path","required":true,"schema":{"type":"string","title":"Key Id"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/health":{"get":{"tags":["Monitoring"],"summary":"Service health check","description":"Liveness/readiness probe for load balancers and monitoring.\n\nReturns per-GPU metrics (utilization, memory), whether Redis is reachable,\nand the number of active Celery workers. No authentication required.\n\nA `status` of `\"degraded (no redis)\"` means the API can still serve\nexisting results but cannot accept new parse tasks.","operationId":"health_health_get","responses":{"200":{"description":"GPU status, Redis connectivity, and active worker count.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HealthResponse"}}}}}}},"/gpustatus":{"get":{"tags":["Monitoring"],"summary":"GPU status only","description":"Quick GPU load snapshot — lighter than /health (skips Redis & worker checks).\n\nUseful for scripts that need to check GPU availability before submitting\nlarge batches. No authentication required.","operationId":"gpu_status_gpustatus_get","responses":{"200":{"description":"Per-GPU utilization and memory snapshot.","content":{"application/json":{"schema":{}}}}}}},"/parse":{"post":{"tags":["Parsing"],"summary":"Submit a PDF for parsing","description":"Submit a PDF document for GPU-accelerated parsing.\n\nThe file is saved and enqueued for asynchronous processing by a Celery\nworker. The response returns immediately with a `task_id` — use\n`GET /status/{task_id}` to track progress and `GET /result/{task_id}`\nto retrieve the parsed Markdown once complete.\n\n**Engine choice:**\n- `struct` — structure-aware engine, best for English academic papers\n  with complex layouts and formulas.\n- `polyglot` — multilingual engine, best for Chinese or mixed-language documents.","operationId":"submit_parse_parse_post","parameters":[{"name":"engine","in":"query","required":false,"schema":{"$ref":"#/components/schemas/EngineChoice","default":"struct"}}],"requestBody":{"required":true,"content":{"multipart/form-data":{"schema":{"$ref":"#/components/schemas/Body_submit_parse_parse_post"}}}},"responses":{"202":{"description":"Task accepted and enqueued for GPU processing.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ParseResponse"}}}},"400":{"description":"File is not a PDF or is empty."},"401":{"description":"Missing or invalid X-API-Key header."},"413":{"description":"PDF exceeds 100 MB size limit."},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/status/{task_id}":{"get":{"tags":["Parsing"],"summary":"Check parse task status","description":"Poll the lifecycle state of a previously submitted parse task.\n\n**Status values:**\n- `queued` — waiting for an available GPU worker.\n- `started` — a worker is actively processing the PDF.\n- `retrying` — the worker's GPU was busy, task re-queued for another attempt.\n- `success` — parsing complete, result available at `GET /result/{task_id}`.\n- `failure` — parsing failed, check the `error` field for details.\n\nPoll every 5 seconds until the status is terminal (`success` or `failure`).","operationId":"task_status_status__task_id__get","parameters":[{"name":"task_id","in":"path","required":true,"schema":{"type":"string","title":"Task Id"}}],"responses":{"200":{"description":"Current task status.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/TaskStatus"}}}},"401":{"description":"Missing or invalid X-API-Key header."},"404":{"description":"Task not found (never existed or result expired)."},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/result/{task_id}":{"get":{"tags":["Parsing"],"summary":"Retrieve parsed Markdown result","description":"Fetch the final parsed output for a completed task.\n\nReturns 409 Conflict if the task is still processing — poll\n`GET /status/{task_id}` until it reaches `success` or `failure`\nbefore calling this endpoint.","operationId":"task_result_result__task_id__get","parameters":[{"name":"task_id","in":"path","required":true,"schema":{"type":"string","title":"Task Id"}}],"responses":{"200":{"description":"Parsed result.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/TaskResult"}}}},"401":{"description":"Missing or invalid X-API-Key header."},"404":{"description":"Task not found."},"409":{"description":"Task exists but is not yet complete — poll /status first."},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/task/{task_id}":{"delete":{"tags":["Parsing"],"summary":"Cancel a queued or running task","description":"Cancel a task that is queued or currently being processed.\n\nIf the task is already running, the worker process is terminated.\nIf the task has already completed, this is a no-op.","operationId":"revoke_task_task__task_id__delete","parameters":[{"name":"task_id","in":"path","required":true,"schema":{"type":"string","title":"Task Id"}}],"responses":{"200":{"description":"Task revoked.","content":{"application/json":{"schema":{}}}},"401":{"description":"Missing or invalid X-API-Key header."},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}}},"components":{"schemas":{"Body_submit_parse_parse_post":{"properties":{"file":{"type":"string","contentMediaType":"application/octet-stream","title":"File","description":"PDF file to parse (max 100 MB)."}},"type":"object","required":["file"],"title":"Body_submit_parse_parse_post"},"CreateKeyRequest":{"properties":{"name":{"type":"string","title":"Name"}},"type":"object","required":["name"],"title":"CreateKeyRequest"},"EngineChoice":{"type":"string","enum":["struct","polyglot"],"title":"EngineChoice","description":"PDF parsing engine selection.\n\n- **struct**: Structure-aware engine — excels at document layout analysis,\n  formula / equation recognition, and academic paper quality.\n- **polyglot**: Multilingual engine — excels at Chinese and mixed-language\n  documents with robust text extraction across writing systems."},"GPUInfo":{"properties":{"index":{"type":"integer","title":"Index","description":"Zero-based GPU device index.","examples":[0]},"name":{"type":"string","title":"Name","description":"GPU model name as reported by the driver.","examples":["NVIDIA GeForce RTX 4090"]},"utilization_pct":{"type":"number","title":"Utilization Pct","description":"GPU compute utilization percentage (0–100).","examples":[15.0]},"memory_used_mb":{"type":"integer","title":"Memory Used Mb","description":"GPU memory currently in use, in megabytes.","examples":[8192]},"memory_total_mb":{"type":"integer","title":"Memory Total Mb","description":"Total GPU memory capacity, in megabytes.","examples":["24576"]},"memory_pct":{"type":"number","title":"Memory Pct","description":"GPU memory usage percentage (0–100).","examples":[33.3]}},"type":"object","required":["index","name","utilization_pct","memory_used_mb","memory_total_mb","memory_pct"],"title":"GPUInfo","description":"Per-GPU metrics sampled at request time via NVML."},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"HealthResponse":{"properties":{"status":{"type":"string","title":"Status","description":"'healthy' if all dependencies are up, 'degraded (no redis)' if Redis is unreachable.","examples":["healthy"]},"gpus":{"items":{"$ref":"#/components/schemas/GPUInfo"},"type":"array","title":"Gpus","description":"Per-GPU status snapshot. Empty list if no GPUs detected."},"redis_connected":{"type":"boolean","title":"Redis Connected","description":"Whether the Redis broker is reachable.","examples":[true]},"workers":{"type":"integer","title":"Workers","description":"Number of Celery workers currently active.","default":0,"examples":[2]}},"type":"object","required":["status","gpus","redis_connected"],"title":"HealthResponse","description":"Liveness/readiness probe response. No authentication required."},"ParseResponse":{"properties":{"task_id":{"type":"string","title":"Task Id","description":"Unique task identifier. Use this to poll /status/{task_id} and fetch /result/{task_id}.","examples":["f3a1b9c0d2e4"]},"status":{"type":"string","title":"Status","description":"Initial task status — always 'queued' on submission.","default":"queued","examples":["queued"]},"message":{"type":"string","title":"Message","description":"Human-readable confirmation message.","default":"Task submitted","examples":["Task submitted"]}},"type":"object","required":["task_id"],"title":"ParseResponse","description":"Response returned immediately after submitting a PDF for parsing."},"TaskResult":{"properties":{"task_id":{"type":"string","title":"Task Id","description":"Task identifier.","examples":["f3a1b9c0d2e4"]},"status":{"type":"string","title":"Status","description":"'success' or 'failure'.","examples":["success"]},"markdown":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Markdown","description":"Parsed document as Markdown text. null if the task failed.","examples":["# Paper Title\n\n## Abstract\n\nThis paper presents..."]},"engine":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Engine","description":"Engine that produced this result (struct or polyglot).","examples":["struct"]},"parse_time_s":{"anyOf":[{"type":"number"},{"type":"null"}],"title":"Parse Time S","description":"Wall-clock parsing time in seconds.","examples":[42.3]},"error":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Error","description":"Error message if status is 'failure', otherwise null."}},"type":"object","required":["task_id","status"],"title":"TaskResult","description":"Final parsed output. Only available when status is 'success'."},"TaskStatus":{"properties":{"task_id":{"type":"string","title":"Task Id","description":"Task identifier (same as returned by POST /parse).","examples":["f3a1b9c0d2e4"]},"status":{"type":"string","title":"Status","description":"Task lifecycle state: queued → started → retrying → success | failure.","examples":["started"]},"engine":{"type":"string","title":"Engine","description":"Engine used for this task (struct or polyglot). Empty string if unknown.","default":"","examples":["struct"]},"created_at":{"anyOf":[{"type":"string","format":"date-time"},{"type":"null"}],"title":"Created At","description":"ISO-8601 timestamp when the task was submitted.","examples":["2025-06-15T14:32:00Z"]},"started_at":{"anyOf":[{"type":"string","format":"date-time"},{"type":"null"}],"title":"Started At","description":"ISO-8601 timestamp when a worker began processing."},"completed_at":{"anyOf":[{"type":"string","format":"date-time"},{"type":"null"}],"title":"Completed At","description":"ISO-8601 timestamp when processing finished (success or failure)."},"error":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Error","description":"Error message if status is 'failure', otherwise null."}},"type":"object","required":["task_id","status"],"title":"TaskStatus","description":"Current state of a parse task. Poll this to know when the result is ready."},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"},"input":{"title":"Input"},"ctx":{"type":"object","title":"Context"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"}}},"tags":[{"name":"Parsing","description":"Submit PDFs, poll progress, and retrieve parsed Markdown results."},{"name":"Monitoring","description":"Health checks and GPU status — no authentication required."},{"name":"admin","description":"API key management — only accessible from localhost on the GPU machine."}]}