API Documentation

Extraction API

Programmatically extract tables from PDF and image files. Requires an active paid subscription. Manage API keys in your account dashboard.

Endpoint

Send a POST request with multipart/form-data to the following URL:

POST https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions

Note: The base URL (https://dmitokp270.execute-api.eu-central-1.amazonaws.com) depends on your specific API Gateway deployment stage.

Authentication

Include your API key as a Bearer token in the Authorization header.

Authorization: Bearer sk_live_YOUR_API_KEY_HERE

Generate and manage your keys in the API Keys section of your account.

Request Body

The request must be sent as multipart/form-data and include the following parts:

file (Required): The file to process.
- Supported types: application/pdf, image/jpeg, image/png.
- Maximum size: 5 MB.
pageNumber or pageNumbers (Optional, relevant for PDF only):
- To process a single specific page, send pageNumber with the page number as a string (e.g., pageNumber="1").
- To process multiple specific pages, send pageNumbers with a JSON array of page numbers as a string (e.g., pageNumbers="[1, 3, 5]").
- Page numbers must be positive integers.
- If omitted for a PDF, the API defaults to processing page 1.
- This field is ignored for image files.

Success Response (200 OK)

Returns a JSON object containing the extracted results, grouped by page.

{
  "results": [
    {
      "page": 1, // Page number (integer) or "image"
      "tables": [
        {
          "id": "textract-table-block-id-1", // Unique ID from Textract
          "data": [
            ["Header 1", "Header 2"],
            ["Row 1 Cell 1", "Row 1 Cell 2"],
            ["Row 2 Cell 1", "Row 2 Cell 2"]
          ]
        },
        // ... other tables found on this page
      ]
    },
    {
      "page": 3, 
      "tables": [ // May be empty if no tables found on page 3
         // ... tables from page 3 ...
      ]
    }
    // ... other processed pages ...
  ]
}

Error Responses

The API uses standard HTTP status codes for errors:

400 Bad Request: Invalid input (e.g., missing file, invalid page numbers, file too large but under gateway limit). Check the `error` message in the response body.
401 Unauthorized: API key is missing, invalid, or inactive.
403 Forbidden: API key is valid, but the user is on the free plan or has an unknown subscription status.
413 Payload Too Large: File size exceeds the 5MB limit enforced by the Lambda function.
415 Unsupported Media Type: The `Content-Type` header was not `multipart/form-data`.
429 Too Many Requests: Usage limit reached (based on user plan) or API Gateway/WAF rate limit exceeded.
500 Internal Server Error: An unexpected error occurred during processing (e.g., S3 upload failure, Textract error, PDF processing error).

Example (cURL)

Example of extracting tables from page 2 of a PDF:

curl -X POST   -H "Authorization: Bearer sk_live_YOUR_API_KEY_HERE"   -F "file=@path/to/your/document.pdf;type=application/pdf"   -F "pageNumber=2"   https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions

Example of extracting tables from pages 1 and 5 of a PDF:

curl -X POST   -H "Authorization: Bearer sk_live_YOUR_API_KEY_HERE"   -F "file=@path/to/your/document.pdf;type=application/pdf"   -F "pageNumbers=[1, 5]"   https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions

Example of extracting tables from an image:

curl -X POST   -H "Authorization: Bearer sk_live_YOUR_API_KEY_HERE"   -F "file=@path/to/your/image.png;type=image/png"   https://dmitokp270.execute-api.eu-central-1.amazonaws.com/v1/extractions