Streaming an ExcelJS file to S3 in Node
Just recently I needed to find some serious performance gains in uploading an excel file to Amazon S3. We were using the xlsx library to write a file and then upload it to s3 and needed to upload a file with approximately 140,000 records. Our current method was very inefficient and led me to start researching other ways of doing it.
I found the ExcelJs library and decided to give it a shot. Unfortunately, I didn’t find many solid examples of how to stream created excel files to S3, so after writing my own, wanted to share here with some things to keep in mind. Full code example will be at the bottom of the article.
This cut our processing and upload time in half, which was a huge gain for us! We were able to create and upload a 140,000 line file in about 6 seconds.
Import Libs and Define a Stream
ExcelJs allows you to build a workbook and to use a provided stream, so to start, ensure you provide it with the stream you’re going to ultimately upload to S3.
// Imports
const Stream = require('stream');
const ExcelJs = require('exceljs');
const AWS = require('aws-sdk');
const s3 = new AWS.s3();// Define your stream
const stream = new Stream.PassThrough();const workbook = new ExcelJs.stream.xlsx.WorkbookWriter({
stream: stream,
});
Create a Worksheet, Define Columns
After creating a workbook, add a worksheet to it and also define your columns.
Note: You must define columns for ExcelJs to be able to map an object with properties to the correct columns. If you find yourself with empty worksheets, this may be something you forgot to do. This example maps all object properties to columns.
// Define worksheetconst worksheet = workbook.addWorksheet('My Data Sheet');// Set column headerslet columns = [];
for (const prop in data[0]) {
columns.push({
header: prop,
key: prop,
});
}
worksheet.columns = columns;
Add Data, Commit All Changes
When creating a workbook as a stream writer, you need to commit all changes, including the worksheet you create, otherwise when you upload your file, you’ll find it empty. There will be an example of multiple sheets provided as well.
data.forEach((row) => {
worksheet.addRow(row).commit();
});// Commit all changesworksheet.commit();
workbook.commit();
Upload to S3
To upload a stream to S3, you must use the upload
method instead of the putObject
method. If you wish to await
this upload, you must call .promise()
after calling upload, as the upload method will take a callback by default.
// Upload to s3await s3
.upload({
Bucket: // Put your bucket name,
Key: // File key here,
Body: stream, // the stream defined above.
})
.promise();
And you’re done!
Bonus: Full typescript example (untested).
const Stream = require('stream');
const ExcelJs = require('exceljs');
const AWS = require('aws-sdk');
const s3 = new AWS.s3();export class ExcelStreamUploader {
// excelData is just an array of objects. All objects have the same properties.
uploadSingleSheetToS3(excelData) {
const stream = new Stream.PassThrough(); const workbook = new ExcelJs.stream.xlsx.WorkbookWriter({
stream: stream,
}); // Define worksheet
const worksheet = workbook.addWorksheet('My Data Sheet');
// Set column headers
worksheet.columns = this.getDataHeaders(data[0]); data.forEach((row) => {
worksheet.addRow(row).commit();
}); // Commit all changes
worksheet.commit();
workbook.commit(); // Upload to s3
await s3
.upload({
Bucket: // Put your bucket name,
Key: // File key here,
Body: stream, // the stream defined above.
})
.promise();
// Done
} // Sheets is an array of objects with two properties: name and data.
// Name is a string, data is an array of objects
uploadMultipleSheetsToS3(sheets) {
const stream = new Stream.PassThrough(); const workbook = new ExcelJs.stream.xlsx.WorkbookWriter({
stream: stream,
}); sheets.forEach((sheet) => {
// Define worksheet
const worksheet = workbook.addWorksheet(sheet.name);
// Set column headers
worksheet.columns = this.getDataHeaders(sheet.data[0]); sheet.data.forEach((row) => {
worksheet.addRow(row).commit();
}); // Commit all changes
worksheet.commit();
});
workbook.commit(); // Upload to s3
await s3
.upload({
Bucket: // Put your bucket name,
Key: // File key here,
Body: stream, // the stream defined above.
})
.promise();
// Done
} private getDataHeaders(row) {
let columns = [];
for (const prop in data[0]) {
columns.push({
header: prop,
key: prop,
});
}
return columns;
}
}