API Documentation
Extraction API
Programmatically extract tables from PDF and image files. Requires an active paid subscription. Manage API keys in your account dashboard.
Endpoint
Send a POST request with multipart/form-data to the following URL:
POST https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions
Note: The base URL (https://dmitokp270.execute-api.eu-central-1.amazonaws.com
) depends on your specific API Gateway deployment stage.
Authentication
Include your API key as a Bearer token in the Authorization
header.
Authorization: Bearer sk_live_YOUR_API_KEY_HERE
Generate and manage your keys in the API Keys section of your account.
Request Body
The request must be sent as multipart/form-data
and include the following parts:
file
(Required): The file to process.- Supported types:
application/pdf
,image/jpeg
,image/png
. - Maximum size: 5 MB.
- Supported types:
pageNumber
orpageNumbers
(Optional, relevant for PDF only):- To process a single specific page, send
pageNumber
with the page number as a string (e.g.,pageNumber="1"
). - To process multiple specific pages, send
pageNumbers
with a JSON array of page numbers as a string (e.g.,pageNumbers="[1, 3, 5]"
). - Page numbers must be positive integers.
- If omitted for a PDF, the API defaults to processing page 1.
- This field is ignored for image files.
- To process a single specific page, send
Success Response (200 OK)
Returns a JSON object containing the extracted results, grouped by page.
{
"results": [
{
"page": 1, // Page number (integer) or "image"
"tables": [
{
"id": "textract-table-block-id-1", // Unique ID from Textract
"data": [
["Header 1", "Header 2"],
["Row 1 Cell 1", "Row 1 Cell 2"],
["Row 2 Cell 1", "Row 2 Cell 2"]
]
},
// ... other tables found on this page
]
},
{
"page": 3,
"tables": [ // May be empty if no tables found on page 3
// ... tables from page 3 ...
]
}
// ... other processed pages ...
]
}
Error Responses
The API uses standard HTTP status codes for errors:
- 400 Bad Request: Invalid input (e.g., missing file, invalid page numbers, file too large but under gateway limit). Check the `error` message in the response body.
- 401 Unauthorized: API key is missing, invalid, or inactive.
- 403 Forbidden: API key is valid, but the user is on the free plan or has an unknown subscription status.
- 413 Payload Too Large: File size exceeds the 5MB limit enforced by the Lambda function.
- 415 Unsupported Media Type: The `Content-Type` header was not `multipart/form-data`.
- 429 Too Many Requests: Usage limit reached (based on user plan) or API Gateway/WAF rate limit exceeded.
- 500 Internal Server Error: An unexpected error occurred during processing (e.g., S3 upload failure, Textract error, PDF processing error).
Example (cURL)
Example of extracting tables from page 2 of a PDF:
curl -X POST -H "Authorization: Bearer sk_live_YOUR_API_KEY_HERE" -F "file=@path/to/your/document.pdf;type=application/pdf" -F "pageNumber=2" https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions
Example of extracting tables from pages 1 and 5 of a PDF:
curl -X POST -H "Authorization: Bearer sk_live_YOUR_API_KEY_HERE" -F "file=@path/to/your/document.pdf;type=application/pdf" -F "pageNumbers=[1, 5]" https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions
Example of extracting tables from an image:
curl -X POST -H "Authorization: Bearer sk_live_YOUR_API_KEY_HERE" -F "file=@path/to/your/image.png;type=image/png" https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions