Automation

How to automate WooCommerce exports with scheduled scraping and webhooks

Running the scraper by hand once is fine for a single migration. Keeping a price file, a stock feed or a second store in sync is not a job for clicking a button every morning. This guide shows how to automate WooCommerce exports end to end: launch asynchronous scraping jobs through the API, poll or wait for webhook callbacks, schedule recurring runs with cron, and stay inside the rate limits while you do it.

Why automate exports instead of scraping by hand

A one-off scrape answers a one-off question: what does this catalog look like right now? Most real work is not one-off. Prices move, stock changes, products get added and retired, and the downstream systems that depend on that data (a spreadsheet, a Google Merchant feed, a second storefront, a backup archive) need to reflect those changes without a human in the loop. Automating WooCommerce exports turns a manual chore into a background process that runs on a schedule and tells you when it is done.

A few concrete reasons teams automate their exports:

  • Price and stock monitoring. Pull a competitor or supplier catalog every morning, compare against yesterday, and flag the deltas. Manual scraping cannot keep up with a catalog that changes hourly.
  • Feed refresh. Google Merchant Center, Meta catalogs and affiliate networks expect fresh data. A scheduled export keeps the source file current so the feed never goes stale.
  • Catalog sync. If you mirror a WooCommerce catalog into another store, a CMS or a search index, you want the export to fire automatically and push the result downstream.
  • Backups and history. A nightly export gives you a versioned snapshot of the catalog you can diff, restore or audit later.

If you are new to the API, start with the WooCommerce Scraper API guide for authentication and endpoints, and the step-by-step how-to for a first request. This page assumes you already have a key and focuses on running exports unattended.

The async job model

Scraping a full WooCommerce catalog can take seconds for a small shop or several minutes for a store with tens of thousands of products and variations. Holding an HTTP connection open for that long is fragile: timeouts, proxies and load balancers all conspire against it. The API therefore uses an asynchronous job model. You ask for an export, you get back a job id immediately, and the work happens in the background where it can survive a large store without your client waiting on the wire.

You create a job with a single POST /api/v1/jobs call. The JSON body declares what to export (type), which store (store), and optionally where to send the result when the job finishes (webhook_url). The base URL is https://woocommerce-scraper.com and every call is authenticated with a Bearer token that starts with wcs_live_.

curl -X POST https://woocommerce-scraper.com/api/v1/jobs \
  -H "Authorization: Bearer wcs_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "products",
    "store": "https://example-store.com",
    "webhook_url": "https://your-app.com/hooks/wcs"
  }'

The response is an HTTP 202 Accepted, which means the request was received and the export is now queued. The body carries the job id you will use everywhere else:

{
  "id": "job_8f2a1c9d",
  "status": "queued",
  "type": "products",
  "store": "https://example-store.com"
}

The webhook_url field is optional. Leave it out and you will track the job by polling (next section); include it and the API will call you back when the export finishes (the section after that). Both patterns can be combined, and both are valid ways to automate WooCommerce exports.

Polling job status and reading results

Once a job is queued you can check on it with GET /api/v1/jobs/{id}. The status field walks through a small, predictable set of states:

  • queued - accepted and waiting for a worker.
  • running - the worker is reading the catalog right now.
  • succeeded - every product was scraped and results are ready.
  • partial - results are available but some pages or products could not be read; useful data is still there.
  • failed - the export could not complete (bad store URL, unreachable site, or the store is not WooCommerce).
curl https://woocommerce-scraper.com/api/v1/jobs/job_8f2a1c9d \
  -H "Authorization: Bearer wcs_live_your_key_here"

When the status reaches succeeded or partial, read the products from GET /api/v1/jobs/{id}/results. Results are paginated so a large catalog does not arrive in a single huge payload. Follow the pagination cursor or page parameter until you have read every page.

curl "https://woocommerce-scraper.com/api/v1/jobs/job_8f2a1c9d/results?page=1" \
  -H "Authorization: Bearer wcs_live_your_key_here"

Polling is the simplest pattern to reason about and works from any environment, including a cron job that cannot expose a public endpoint. The only rule is to poll politely: a fixed interval of a few seconds with a sensible backoff is plenty, and it keeps you well clear of the rate limit. If you can receive inbound HTTP, webhooks remove the polling loop entirely.

Webhook callbacks

Polling means your code asks "are you done yet?" on a loop. Webhooks flip that around: you tell the API where to reach you, and it calls you once when the job is finished. This is the cleaner pattern for automation because there is no idle loop and no wasted requests. Provide a webhook_url when you create the job and, the moment the export reaches a terminal state, the API sends an HTTP POST to that URL with the finished job payload.

Your endpoint receives a body that looks like the job object, including its final status and a link or inline reference to the results:

POST /hooks/wcs  (sent by the API to your server)
Content-Type: application/json

{
  "id": "job_8f2a1c9d",
  "status": "succeeded",
  "type": "products",
  "store": "https://example-store.com",
  "results_url": "https://woocommerce-scraper.com/api/v1/jobs/job_8f2a1c9d/results",
  "product_count": 1842
}

To consume a webhook safely:

  • Acknowledge fast. Return a 200 as soon as you have stored the payload. Do the heavy work (fetching pages, writing to your database) in a background task so the callback does not time out.
  • Verify the source. Confirm the request really comes from the API before acting on it, and check that the id matches a job you actually created.
  • Be idempotent. Networks retry. If the same job id arrives twice, the second delivery should be a no-op rather than a duplicate import.
  • Then process. On succeeded or partial, page through the results URL and push the data into your catalog, feed or store.

With webhooks in place a full export becomes a single fire-and-forget POST: you launch the job and your endpoint does the rest when the data is ready.

Scheduling recurring exports with cron

Automation needs a trigger. The simplest, most portable trigger is cron. Because creating a job is one HTTP call, a recurring export is one line in your crontab that fires that call on a schedule. The example below launches a products export every day at 06:00 and asks the API to call your webhook when it finishes:

# m h dom mon dow  command
0 6 * * * curl -s -X POST https://woocommerce-scraper.com/api/v1/jobs \
  -H "Authorization: Bearer wcs_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"type":"products","store":"https://example-store.com","webhook_url":"https://your-app.com/hooks/wcs"}'

That single entry is enough for a daily price and stock refresh. A few ideas for schedules worth running:

  • Daily price refresh at a quiet hour, then diff against the previous run to surface price changes.
  • Hourly stock check on a small, fast-moving catalog to keep availability accurate downstream.
  • Weekly full snapshot archived with a date in the filename so you keep a history you can roll back to.

If you are not on a server with cron, the same idea works with any scheduler: a CI cron job, a serverless timer, or a task queue. The contract is identical: on a schedule, send one POST /api/v1/jobs and let the async model and your webhook handle the rest. For the full set of endpoints and limits, keep the developer reference open while you build.

Keeping a catalog or feed in sync

A scheduled export is only half the value. The other half is what you do with the result. The same job output feeds several sync targets:

  • A spreadsheet or internal catalog. Transform the paginated results into rows and upsert them by SKU so the file always mirrors the live store.
  • A Google Merchant feed. Map the exported fields (title, price, availability, image, link) onto the Merchant Center schema and republish the feed after each run, so your shopping ads never carry stale prices.
  • Another store. Pipe the export into a second storefront to keep two catalogs aligned.

You do not have to build every transform yourself. If your target is a flat file, the CSV export turns a catalog into a clean, column-mapped spreadsheet you can drop straight into a feed pipeline. If your target is Shopify, the WooCommerce to Shopify migration guide shows how the same export moves a full catalog into a Shopify import template. Combine a scheduled job with one of these outputs and the whole sync runs without you touching it.

Quota and rate limits

Automation can quietly multiply your request volume, so design the schedule with the limits in mind. Two numbers matter on the live tier:

  • 120 requests per minute. This is the short-term rate limit. Creating jobs, polling status and reading result pages all count. A tight polling loop on several jobs at once is the usual way to hit it, so add a small interval and back off when you see a rate-limit response.
  • 100000 live API calls per month. This is the monthly quota. A single daily export with light polling uses a tiny fraction of it, but hourly schedules across many stores, with aggressive pagination, add up. Budget your calls per run and multiply by the schedule frequency before you turn it on.

Designing a polite schedule is mostly common sense. Prefer webhooks over polling so you spend requests only on creating jobs and reading results, not on idle status checks. When you must poll, use a fixed interval of a few seconds rather than a tight loop. Stagger jobs for multiple stores instead of firing them all at the same minute. These habits keep you comfortably under both limits and make your automation a good neighbour to the API.

Ready to wire it up? Grab a key on the developers page, or run a manual export first from the scraper to see the data shape before you automate it.

Frequently asked questions

Do I need webhooks to automate exports, or is polling enough?

Polling is enough and works anywhere, including a cron job with no public endpoint. Webhooks are cleaner because they remove the idle polling loop: you launch the job and the API calls you back when it finishes. Use webhooks when you can receive inbound HTTP, and polling otherwise.

What does a "partial" job status mean?

It means the export completed but some pages or products could not be read, often because of a transient error on the source store. The results you can read are still valid and usable; you can re-run the job later to fill the gaps.

How often can I schedule an export?

As often as your quota allows. The live tier permits 120 requests per minute and 100000 API calls per month. A single daily export uses very little of that; hourly runs across many stores can add up, so budget the calls per run against the monthly quota before scaling up the frequency.

How do I read the results of a finished job?

Call GET /api/v1/jobs/{id}/results once the status is succeeded or partial. Results are paginated, so follow the page parameter until you have read every page of the catalog.

Can I keep a Google Merchant feed in sync automatically?

Yes. Schedule a recurring export with cron, map the exported fields onto the Merchant Center schema, and republish the feed after each run. Using the CSV export as the intermediate file makes the mapping straightforward.

Automate your first export today

Get an API key, launch an async job, and let webhooks keep your catalog in sync.

Related guides