# Universal Scraper

## Endpoint

```
POST https://api.crawlbyte.ai/api/tasks
```

## Basic Configuration (Required)

```json
{
  "type": "universal",
  "input": [
    "https://example.com/page"
  ],
  "multithread": false
}
```

### Parameters

| Field       | Type    | Description                                            |
| ----------- | ------- | ------------------------------------------------------ |
| type        | string  | Always `"universal"`                                   |
| input       | array   | Array of valid URLs                                    |
| multithread | boolean | Use `true` for faster processing with multiple threads |

## Advanced Configuration (Optional)

```json
{
  "type": "universal",
  "input": [
    "https://example.com/page"
  ],
  "multithread": true,
  "jsRendering": true,
  "customSelector": "#main-content",
  "method": "GET",
  "headers": "{\"X-Test\":\"abc\"}",
  "cookie": "session=xyz",
  "proxy": "http://username:password@ip:port"
}
```

### Optional Parameters

| Field               | Type    | Description                                                                                                          |
| ------------------- | ------- | -------------------------------------------------------------------------------------------------------------------- |
| jsRendering         | boolean | Enable full-page JavaScript rendering.                                                                               |
| customSelector      | string  | CSS selector (e.g., `#main-content`) to extract specific HTML.                                                       |
| method              | string  | HTTP method (`GET`, `POST`, `PUT`, `PATCH`)                                                                          |
| user\_agent\_preset | string  | Preset user-agent. Options: `chrome`, `firefox`, `edge`, `opera`, `safari`, `ios-safari`, `android-chrome`, `custom` |
| user\_agent\_custom | string  | Used if `user_agent_preset` is `custom`                                                                              |
| headers             | string  | JSON-formatted string of headers.                                                                                    |
| cookie              | string  | `key=value;`                                                                                                         |
| proxy               | string  | `http://username:password@ip:port`                                                                                   |

## Pricing

* **$0.005 per successful task**\
  This is a pay-as-you-go pricing model — you're only charged when a Universal task successfully returns HTML or JSON content.

You can view your current credit balance and usage history in the [Crawlbyte Dashboard](https://dash.crawlbyte.ai/).

## Response

The response contains metadata about the task. For `universal` type, the most important fields are `status` and `result`.

```json
{
  "id": "af3e12f2-8f45-43b0-8a7b-cabbbb94c1e9",
  "status": "completed",
  "result": "HTML or JSON RESULT_HERE"
}
```

* If `result` is **raw HTML** or a **JSON object**, no further polling is needed — this is the final data.

### Status Types

| Status     | Meaning                                                        |
| ---------- | -------------------------------------------------------------- |
| queued     | Task was accepted and added to the processing queue.           |
| processing | Task is currently running.                                     |
| completed  | Task finished successfully (emails were submitted to Beehiiv). |
| failed     | Task encountered an error (e.g. bad proxy, invalid URL, etc.). |

## Polling

If the initial `status` is `queued` or `processing`, you should poll for task completion.

```
GET https://api.crawlbyte.ai/api/tasks/:id
```

* You’ll receive the same structure with an updated `status`.
* Only poll until you receive `completed` or `failed`.
* Rendered pages (with `jsRendering: true`) may take slightly longer.

## SDK Usage

You can run this task using any official **Crawlbyte SDK**:

* [Go SDK](https://github.com/crawlbyte/crawlbyte-sdk-go)
* [TypeScript SDK](https://github.com/crawlbyte/crawlbyte-sdk-ts)
* [Python SDK](https://github.com/crawlbyte/crawlbyte-sdk-py)

Each SDK provides a simple way to:

* Create the task
* Poll for status
* Handle the final result

Refer to the [SDKs section](/sdks.md) for installation, examples, and setup instructions.

## Notes

* The universal scraper is ideal for **public webpages** that don’t require a login.
* Crawlbyte handles fingerprinting, headers, JS execution, bot detection, and more automatically.
* For complex or highly protected pages, the task may fail. If this happens:

  [**Book a free setup call**](https://calendly.com/kristian-crawlbyte/crawlbyte-demo-call?preview_source=et_card) — we’ll configure a custom scraper template for your target site at no cost.
* `jsRendering` is optional but may be required for dynamic sites (e.g., React, Vue, Angular).
* If a `cookie` string is provided, it’s used as-is — **you are fully responsible for its usage** in accordance with our [Terms of Service](https://www.crawlbyte.ai/terms-of-service).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.crawlbyte.ai/universal-scraper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
