Amtrak Scraper

Scrapes train availability and pricing from Amtrak’s booking system using structured input parameters like route and travel dates.

Endpoint

POST https://api.crawlbyte.ai/api/tasks

Basic Configuration (Required)

{
  "type": "amtrak",
  "input": [
    "{\"origin\":\"NYP\",\"destination\":\"PHL\",\"departure\":\"2025-08-01\",\"return\":\"2025-08-08\",\"passengers\":{\"adult\":1,\"seniors\":1,\"youth\":1,\"child\":1,\"infant\":1}}"
  ],
  "multithread": false
}

Parameters

Field

Type

Description

type

string

Always "amtrak"

input

array

Array of JSON strings containing route, date, and passenger details (e.g., origin, destination, departure, return, and passenger breakdown)

multithread

boolean

Use true for faster processing with multiple threads

Input Builder Notes

You must structure the input as a JSON string and insert it into the input array. Use the following fields:

Parameter

Meaning

Example

origin

Origin Station Code

NYP

destination

Destination Station Code

PHL

departure

Departure Date (YYYY-MM-DD)

2025-08-01

return

Return Date (optional)

2025-08-08

passengers

Passenger Count by Type

{"adult":1,"seniors":1,"youth":1,"child":1,"infant":1}

All values must be wrapped in a JSON string (not object) inside the array.
return is optional – omit it for one-way trips.

Advanced Configuration (Optional)

{
  "type": "amtrak",
  "input": [
    "{\"origin\":\"NYP\",\"destination\":\"PHL\",\"departure\":\"2025-08-01\",\"return\":\"2025-08-08\",\"passengers\":{\"adult\":1,\"seniors\":1,\"youth\":1,\"child\":1,\"infant\":1}}"
  ],
  "user_agent_preset": "chrome",
  "user_agent_custom": "",
  "headers": "{\"X-Test\":\"abc\"}",
  "cookie": "session=xyz",
  "proxy": "username:password@ip:port"
}

Optional Parameters

Field

Type

Description

user_agent_preset

string

Preset user-agent. Options: chrome, firefox, edge, opera, safari, ios-safari, android-chrome, custom

user_agent_custom

string

Used if user_agent_preset is custom.

headers

string

JSON-formatted string of headers.

string

key=value;

proxy

string

username:password@ip:port

Pricing

$0.01 per successful task This is a pay-as-you-go pricing model — you're only charged when an Amtrak task successfully returns train availability and fare data.

You can view your current credit balance and usage history in the Crawlbyte Dashboard.

Response

The response contains metadata about the task. For amtrak type, the most important fields are status and result.

{
  "id": "bd3e89ed-815e-4395-98a3-521ede71cc4d",
  "status": "completed",
  "result": {
    // Parsed availability and pricing data
  }
}

result is a JSON object containing the final scraped train availability and fare data — no further polling is needed.

Status Types

Status

Meaning

queued

Task was accepted and added to the processing queue.

processing

Task is currently running.

completed

Task finished and train data was successfully collected.

failed

Task encountered an error (e.g., invalid input, no results, or system issue).

Polling

If status is queued or processing, continue polling the task until it's completed or failed.

GET https://api.crawlbyte.ai/api/tasks/:id

You’ll receive the same structure with an updated status.
Only poll until you receive completed or failed.
Recommended interval: every 2–4 seconds.

SDK Usage

You can run this task using any official Crawlbyte SDK:

Each SDK provides a simple way to:

Create the task
Poll for status
Handle the final result

Refer to the SDKs section for installation, examples, and setup instructions.

Notes

Only valid input objects with correct station codes and date formats will return results.
Crawlbyte handles retries, rendering, fingerprinting, and anti-bot logic internally — no need to manage it yourself.
Use multithread: true in advanced settings if running large volumes.
Ensure all required fields like origin, destination, departure, and passengers are properly structured.
The Amtrak response includes all relevant train data, including schedule and fare breakdowns.
Average task duration is ~8 seconds, primarily due to Amtrak’s slower API response — this is expected and fully supported.

PreviousSDKs NextBeehiiv Subscriber

Last updated 1 month ago