Nimble Web API

Description

This job triggers an asynchronous Web API scraping job. It uses this endpoint: https://api.webit.live/api/v1/async/web

Use Cases

For each WebAPI scraping job you wish to create you must create a separate Orchestra task. This way, you can use Orchestra to trigger your data ingestion on a manual or event-based schedule. This has a number of advantages:

  • You can co-ordinate tasks outside of Nimble. Once the Nimble scraping jobs are completed you can run any data transformation or clean jobs.

  • When Nimble web scraping jobs run, Nimble credits are consumed. Running these operations on a schedule you set explicitly ensures these costs do not go out of hand.

  • We aggregate metadata from the Nimble task in the same place as the metadata from other operations in your Pipeline

Job Information

With Nimble you can provide a URL that Nimble will scrape using it's AI scraper. This will extract the relevant information from the web page and return it to you in a structured format.

As Orchestra cannot accept the data directly from Nimble, you will need to configure a delivery method in Nimble to store the data. This can be an AWS S3 bucket or a GCP storage bucket (see below for more information on delivery methods).

Determine the request you wish to make

Orchestra uses the asynchronous task flow to make a Web API request to Nimble.

Before using Orchestra to trigger a Nimble task it is recommended to build the request you wish to make using the Nimble playground. This will allow you to test the request and ensure it is working as expected. The playground can be found here.

Delivery methods

Orchestra supports the following delivery methods for Nimble:

  • AWS S3 bucket

  • GCP storage

You must configure a delivery method before you can use Nimble in Orchestra. Insturctions on how to configure a delivery method can be found here. Be sure to configure your storage with the correct permissions, instructions for this can be found here.

Parameters

These parameters are required to run the Nimble Web API task.

NameData typeRestrictionsExample

Request URL

String

URL

https://www.example.com

Storage type

Enum

AWS - S3 or GCP - Google Storage

AWS - S3

Storage URL

String

n/a

s3://bucket/prefix

Error handling

API Requests

If we receive the following error codes from Nimble, we'll raise an error and the task will move to a failed state.

CodeDescriptionHandling

401

Unauthorised

We will raise an error and parse the raw error message from the Nimble response as the Orchestra message

Other error code

We will raise an error with the HTTP Reason as the Orchestra message

Last updated