# Yelp Scraper

## Endpoint

```
POST https://api.crawlbyte.ai/api/tasks
```

## Basic Configuration (Required)

#### **Business Profile**

```json
{
  "type": "yelp",
  "input": [
    "https://www.yelp.com/biz/some-restaurant"
  ],
  "dataType": "profiles",
  "multithread": false
}
```

#### **Reviews**

```json
{
  "type": "yelp",
  "input": [
    "https://www.yelp.com/biz/some-restaurant"
  ],
  "dataType": "reviews",
  "sortBy": "NEWEST_FIRST",
  "multithread": false
}

```

### Parameters

| Field       | Type    | Description                                                                                                                                                                                                     |
| ----------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| type        | string  | Always `"yelp"`                                                                                                                                                                                                 |
| input       | array   | Array of Yelp business URLs                                                                                                                                                                                     |
| dataType    | string  | `"profiles"` or `"reviews"`                                                                                                                                                                                     |
| sortBy      | string  | <p><strong>Only for <code>reviews</code></strong> — controls review order:<br>• <code>NEWEST\_FIRST</code><br>• <code>OLDEST\_FIRST</code><br>• <code>HIGHEST\_RATED</code><br>• <code>LOWEST\_RATED</code></p> |
| multithread | boolean | Use `true` for faster processing with multiple threads                                                                                                                                                          |

## Advanced Configuration (Optional)

```json
{
  "type": "yelp",
  "input": [
    "https://www.yelp.com/biz/some-restaurant"
  ],
  "dataType": "reviews",
  "sortBy": "NEWEST_FIRST",
  "user_agent_preset": "chrome",
  "headers": "{\"X-Test\":\"abc\"}",
  "cookie": "",
  "proxy": "http://username:password@ip:port"
}
```

### Optional Parameters

| Field               | Type   | Description                                                                                                          |
| ------------------- | ------ | -------------------------------------------------------------------------------------------------------------------- |
| user\_agent\_preset | string | Preset user-agent. Options: `chrome`, `firefox`, `edge`, `opera`, `safari`, `ios-safari`, `android-chrome`, `custom` |
| user\_agent\_custom | string | Used if `user_agent_preset` is `custom.`                                                                             |
| headers             | string | JSON-formatted string of headers.                                                                                    |
| cookie              | string | `key=value;`                                                                                                         |
| proxy               | string | `http://username:password@ip:port`                                                                                   |

## Pricing

* **$0.005 per successful task**\
  This is a pay-as-you-go pricing model — you're only charged when a Yelp task successfully returns listings or reviews.

You can view your current credit balance and usage history in the [Crawlbyte Dashboard](https://dash.crawlbyte.ai/).

## Response

The response contains metadata about the task. For the `yelp` type, the key fields in the response are `status` and `result`.

```json
{
  "id": "af3e12f2-8f45-43b0-8a7b-cabbbb94c1e9",
  "status": "completed",
  "result": "JSON_RESULT_HERE"
}
```

* If `result` is a **hash**, you must poll `/api/tasks/:id` to retrieve the full data.
* If `result` is a **JSON object**, the data is already available — no polling needed.

### Status Types

| Status     | Meaning                                                        |
| ---------- | -------------------------------------------------------------- |
| queued     | Task was accepted and added to the processing queue.           |
| processing | Task is currently running.                                     |
| completed  | Task finished successfully and data was collected from Yelp.   |
| failed     | Task encountered an error (e.g. bad proxy, invalid URL, etc.). |

## Polling

If the initial `status` is `queued` or `processing`, you should poll for task completion.

```
GET https://api.crawlbyte.ai/api/tasks/:id
```

* You’ll receive the same structure with an updated `status`.
* Only poll until you receive `completed` or `failed`.
* Average time: **2–4 seconds**, but longer for reviews (full scraping).

## SDK Usage

You can run this task using any official **Crawlbyte SDK**:

* [Go SDK](https://github.com/crawlbyte/crawlbyte-sdk-go)
* [TypeScript SDK](https://github.com/crawlbyte/crawlbyte-sdk-ts)
* [Python SDK](https://github.com/crawlbyte/crawlbyte-sdk-py)

Each SDK provides a simple way to:

* Create the task
* Poll for status
* Handle the final result

Refer to the [SDKs section](https://developers.crawlbyte.ai/sdks) for installation, examples, and setup instructions.

## Notes

* This task supports **public Yelp business pages and review listings**.
* If `dataType` is set to `"reviews"`, the `sortBy` field is **required**. Accepted values: `NEWEST_FIRST`, `OLDEST_FIRST`, `HIGHEST_RATED`, `LOWEST_RATED`.
* When scraping reviews, Crawlbyte fetches **all available reviews across all pages**, which may take longer for listings with high volume.
* Crawlbyte handles fingerprinting, bot detection, and pagination internally — no need to configure anything manually.
* You can batch multiple business URLs using `multithread: true`.
