# Yelp Scraper

## Endpoint

```
POST https://api.crawlbyte.ai/api/tasks
```

## Basic Configuration (Required)

#### **Business Profile**

```json
{
  "type": "yelp",
  "input": [
    "https://www.yelp.com/biz/some-restaurant"
  ],
  "dataType": "profiles",
  "multithread": false
}
```

#### **Reviews**

```json
{
  "type": "yelp",
  "input": [
    "https://www.yelp.com/biz/some-restaurant"
  ],
  "dataType": "reviews",
  "sortBy": "NEWEST_FIRST",
  "multithread": false
}

```

### Parameters

| Field       | Type    | Description                                                                                                                                                                                                     |
| ----------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| type        | string  | Always `"yelp"`                                                                                                                                                                                                 |
| input       | array   | Array of Yelp business URLs                                                                                                                                                                                     |
| dataType    | string  | `"profiles"` or `"reviews"`                                                                                                                                                                                     |
| sortBy      | string  | <p><strong>Only for <code>reviews</code></strong> — controls review order:<br>• <code>NEWEST\_FIRST</code><br>• <code>OLDEST\_FIRST</code><br>• <code>HIGHEST\_RATED</code><br>• <code>LOWEST\_RATED</code></p> |
| multithread | boolean | Use `true` for faster processing with multiple threads                                                                                                                                                          |

## Advanced Configuration (Optional)

```json
{
  "type": "yelp",
  "input": [
    "https://www.yelp.com/biz/some-restaurant"
  ],
  "dataType": "reviews",
  "sortBy": "NEWEST_FIRST",
  "user_agent_preset": "chrome",
  "headers": "{\"X-Test\":\"abc\"}",
  "cookie": "",
  "proxy": "http://username:password@ip:port"
}
```

### Optional Parameters

| Field               | Type   | Description                                                                                                          |
| ------------------- | ------ | -------------------------------------------------------------------------------------------------------------------- |
| user\_agent\_preset | string | Preset user-agent. Options: `chrome`, `firefox`, `edge`, `opera`, `safari`, `ios-safari`, `android-chrome`, `custom` |
| user\_agent\_custom | string | Used if `user_agent_preset` is `custom.`                                                                             |
| headers             | string | JSON-formatted string of headers.                                                                                    |
| cookie              | string | `key=value;`                                                                                                         |
| proxy               | string | `http://username:password@ip:port`                                                                                   |

## Pricing

* **$0.005 per successful task**\
  This is a pay-as-you-go pricing model — you're only charged when a Yelp task successfully returns listings or reviews.

You can view your current credit balance and usage history in the [Crawlbyte Dashboard](https://dash.crawlbyte.ai/).

## Response

The response contains metadata about the task. For the `yelp` type, the key fields in the response are `status` and `result`.

```json
{
  "id": "af3e12f2-8f45-43b0-8a7b-cabbbb94c1e9",
  "status": "completed",
  "result": "JSON_RESULT_HERE"
}
```

* If `result` is a **hash**, you must poll `/api/tasks/:id` to retrieve the full data.
* If `result` is a **JSON object**, the data is already available — no polling needed.

### Status Types

| Status     | Meaning                                                        |
| ---------- | -------------------------------------------------------------- |
| queued     | Task was accepted and added to the processing queue.           |
| processing | Task is currently running.                                     |
| completed  | Task finished successfully and data was collected from Yelp.   |
| failed     | Task encountered an error (e.g. bad proxy, invalid URL, etc.). |

## Polling

If the initial `status` is `queued` or `processing`, you should poll for task completion.

```
GET https://api.crawlbyte.ai/api/tasks/:id
```

* You’ll receive the same structure with an updated `status`.
* Only poll until you receive `completed` or `failed`.
* Average time: **2–4 seconds**, but longer for reviews (full scraping).

## SDK Usage

You can run this task using any official **Crawlbyte SDK**:

* [Go SDK](https://github.com/crawlbyte/crawlbyte-sdk-go)
* [TypeScript SDK](https://github.com/crawlbyte/crawlbyte-sdk-ts)
* [Python SDK](https://github.com/crawlbyte/crawlbyte-sdk-py)

Each SDK provides a simple way to:

* Create the task
* Poll for status
* Handle the final result

Refer to the [SDKs section](https://developers.crawlbyte.ai/sdks) for installation, examples, and setup instructions.

## Notes

* This task supports **public Yelp business pages and review listings**.
* If `dataType` is set to `"reviews"`, the `sortBy` field is **required**. Accepted values: `NEWEST_FIRST`, `OLDEST_FIRST`, `HIGHEST_RATED`, `LOWEST_RATED`.
* When scraping reviews, Crawlbyte fetches **all available reviews across all pages**, which may take longer for listings with high volume.
* Crawlbyte handles fingerprinting, bot detection, and pagination internally — no need to configure anything manually.
* You can batch multiple business URLs using `multithread: true`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.crawlbyte.ai/yelp-scraper.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
