How I automate 30+ daily reports with Node.js cron pipelines

April 28, 20264 min read

Node.jsAutomationEngineering

The manual process was eating our mornings

Every morning at Renewalytics, the ops team would download Excel reports from 33+ solar, wind, and BESS plants, copy numbers into a master sheet, run the math, and put together deviation reports. Three to four hours, every day, before anyone could do real work. Errors slipped in constantly. Reports went out late.

I replaced that with a system that does it in 12 minutes, no human in the loop.

What it actually processes

Each plant sends a Daily Generation Report (DGR). These are multi-sheet Excel files with generation data, weather readings, grid availability, inverter-level performance, and maintenance logs. The catch: every vendor's format is different. We see 6+ variants. Some use merged cells. Some have dynamic row counts. Some split data across sheets with cross-references.

This isn't a "parse CSV" problem. It's an "interpret vendor-specific Excel chaos" problem, which is a different beast.

Architecture

Cron Trigger (5:30 AM daily)
  → File Collector (SFTP/email/API per vendor)
  → Format Detector (identify vendor template)
  → Parser Pipeline (vendor-specific extractors)
  → Validation Layer (range checks, completeness)
  → Aggregation Engine (plant → portfolio rollups)
  → Output Generator (reports, dashboards, alerts)

Each stage is an async worker. If a parser fails for one plant, the rest continue. Failed jobs go to a dead-letter queue for manual review.

The parser architecture

The key challenge: every vendor's Excel format is different. Instead of writing one giant parser, I built a plugin system:

interface VendorParser {
  detect(workbook: Workbook): boolean;
  extract(workbook: Workbook): PlantDGR;
}
 
const parsers: VendorParser[] = [
  new SuzlonParser(),
  new GamesaParser(),
  new VestasParser(),
  new EnvisionParser(),
  new InoxParser(),
  new GenericParser(), // fallback
];
 
async function parseReport(buffer: Buffer): Promise<PlantDGR> {
  const workbook = readExcel(buffer);
 
  for (const parser of parsers) {
    if (parser.detect(workbook)) {
      return parser.extract(workbook);
    }
  }
 
  throw new Error("No parser matched this format");
}

Each parser knows how to detect its vendor's template (by checking specific cell values, sheet names, or header patterns) and extract structured data from it.

Handling Excel edge cases

Excel parsing in production is brutal. Here are real issues we've handled:

Merged cells, xlsx library returns undefined for cells covered by a merge; you need to find the merge range and read the top-left cell
Date formats, Excel stores dates as serial numbers; some vendors use custom date strings
Dynamic row ranges, data rows can vary from 10 to 500+; detect the end by looking for totals rows or empty sequences
Hidden sheets, some vendors put raw data in hidden sheets
Formula cells, some cells contain formulas that reference external files; you only get the cached value

The cron orchestrator

Jobs run on a schedule using a simple cron system built on node-cron:

// 5:30 AM, collect files from all sources
cron.schedule("30 5 * * *", async () => {
  const files = await collectFromAllSources();
  for (const file of files) {
    await queue.add("parse-dgr", { fileId: file.id });
  }
});
 
// 6:00 AM, generate portfolio rollups
cron.schedule("0 6 * * *", async () => {
  await generatePortfolioReport(new Date());
});
 
// 6:30 AM, send alerts for missing/failed reports
cron.schedule("30 6 * * *", async () => {
  await checkCompleteness(new Date());
});

Monitoring the automation

An automated system that fails silently is worse than a manual process. We monitor:

Completeness, did all 33+ plants report today?
Timeliness, were reports collected before the processing window?
Data quality, are values within expected ranges?
Pipeline health, are any workers backed up or failing?

Alerts go to a Slack channel and the ops dashboard. If a plant's report is missing at 6:30 AM, the relevant operator gets pinged automatically.

Results

Processing time: 3-4 hours manual → 12 minutes automated
Error rate: ~5% manual → under 0.1% automated
Coverage: 33+ plants, 6+ vendor formats
Uptime: 99.7% over 12 months

What I'd do differently next time

Start from the output, not the input. Define what the final reports need to look like before you write a single parser. I built the first parsers, then realized the rollup format was different, and had to redo extraction logic. Save yourself that round trip.

Version your parsers. Vendors update their Excel templates without telling anyone. Keep the old parsers around and route by template version, not just vendor.

Reconcile against humans for the first month. Run automated and manual in parallel for the first 30 days and diff the outputs. You'll find edge cases nobody documented. Until you do that, nobody trusts the pipeline.

If your business is still running on manual report processing, copy-paste workflows, or fragile spreadsheets, this is exactly what I build. Drop me a line.

Share this postPost on X

Enjoy this post?

Subscribe to get notified when I write something new.

Subscribe via email

PreviousHow I structure large Next.js projects NextBuilding realtime dashboards with Next.js and WebSockets