Every morning at Renewalytics, the ops team would download Excel reports from 33+ solar, wind, and BESS plants, copy numbers into a master sheet, run the math, and put together deviation reports. Three to four hours, every day, before anyone could do real work. Errors slipped in constantly. Reports went out late.
I replaced that with a system that does it in 12 minutes, no human in the loop.
Each plant sends a Daily Generation Report (DGR). These are multi-sheet Excel files with generation data, weather readings, grid availability, inverter-level performance, and maintenance logs. The catch: every vendor's format is different. We see 6+ variants. Some use merged cells. Some have dynamic row counts. Some split data across sheets with cross-references.
This isn't a "parse CSV" problem. It's an "interpret vendor-specific Excel chaos" problem, which is a different beast.
Cron Trigger (5:30 AM daily)
→ File Collector (SFTP/email/API per vendor)
→ Format Detector (identify vendor template)
→ Parser Pipeline (vendor-specific extractors)
→ Validation Layer (range checks, completeness)
→ Aggregation Engine (plant → portfolio rollups)
→ Output Generator (reports, dashboards, alerts)Each stage is an async worker. If a parser fails for one plant, the rest continue. Failed jobs go to a dead-letter queue for manual review.
The key challenge: every vendor's Excel format is different. Instead of writing one giant parser, I built a plugin system:
interface VendorParser {
detect(workbook: Workbook): boolean;
extract(workbook: Workbook): PlantDGR;
}
const parsers: VendorParser[] = [
new SuzlonParser(),
new GamesaParser(),
new VestasParser(),
new EnvisionParser(),
new InoxParser(),
new GenericParser(), // fallback
];
async function parseReport(buffer: Buffer): Promise<PlantDGR> {
const workbook = readExcel(buffer);
for (const parser of parsers) {
if (parser.detect(workbook)) {
return parser.extract(workbook);
}
}
throw new Error("No parser matched this format");
}Each parser knows how to detect its vendor's template (by checking specific cell values, sheet names, or header patterns) and extract structured data from it.
Excel parsing in production is brutal. Here are real issues we've handled:
xlsx library returns undefined for cells covered by a merge; you need to find the merge range and read the top-left cellJobs run on a schedule using a simple cron system built on node-cron:
// 5:30 AM, collect files from all sources
cron.schedule("30 5 * * *", async () => {
const files = await collectFromAllSources();
for (const file of files) {
await queue.add("parse-dgr", { fileId: file.id });
}
});
// 6:00 AM, generate portfolio rollups
cron.schedule("0 6 * * *", async () => {
await generatePortfolioReport(new Date());
});
// 6:30 AM, send alerts for missing/failed reports
cron.schedule("30 6 * * *", async () => {
await checkCompleteness(new Date());
});An automated system that fails silently is worse than a manual process. We monitor:
Alerts go to a Slack channel and the ops dashboard. If a plant's report is missing at 6:30 AM, the relevant operator gets pinged automatically.
Start from the output, not the input. Define what the final reports need to look like before you write a single parser. I built the first parsers, then realized the rollup format was different, and had to redo extraction logic. Save yourself that round trip.
Version your parsers. Vendors update their Excel templates without telling anyone. Keep the old parsers around and route by template version, not just vendor.
Reconcile against humans for the first month. Run automated and manual in parallel for the first 30 days and diff the outputs. You'll find edge cases nobody documented. Until you do that, nobody trusts the pipeline.
If your business is still running on manual report processing, copy-paste workflows, or fragile spreadsheets, this is exactly what I build. Drop me a line.