Understanding Node.js - Profiling

Introduction

This is the third and final article in our series on the architecture of Node.js, focusing specifically on threading and CPU management. Our upcoming articles will shift to topics related to memory management and profiling.

In this post, we’ll explore Node.js's built-in capabilities for profiling and identifying performance bottlenecks. These tools help detect code inefficiencies that may slow down the application and degrade overall performance.

Problem

Please take a look at the following code (it returns the size of a requested file passed on as a query parameter).

import express from "express";
import { randomBytes } from "node:crypto";

const app = express();
const port = 3000;

app.get("/content", (req, res) => {
  const file = req.query.file as string;
  console.log(`Reading file: ${file}`);
  const content = fs.readFileSync(file);
  res.send({
    message: "File read",
    length: content.length,
  });
});

app.listen(port, () => {
  console.log(`Example app listening on port ${port}`);
});

At first glance, it does not look as something that can cause any performance issues. So, let us try to benchmark the endpoint with ab tool. Start the application and run following in the terminal:

ab -k -c 20 -n 300 "http://localhost:3000/content?file=file.zip" 

After a while, you should see the results:

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Finished 300 requests


Server Software:        
Server Hostname:        localhost
Server Port:            3000

Document Path:          /content?file=file.zip
Document Length:        42 bytes

Concurrency Level:      20
Time taken for tests:   26.227 seconds
Complete requests:      300
Failed requests:        0
Keep-Alive requests:    300
Total transferred:      83100 bytes
HTML transferred:       12600 bytes
Requests per second:    11.44 [#/sec] (mean)
Time per request:       1748.483 [ms] (mean)
Time per request:       87.424 [ms] (mean, across all concurrent requests)
Transfer rate:          3.09 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       2
Processing:    82 1688 266.2   1748    1933
Waiting:       82 1688 266.3   1748    1933
Total:         82 1688 266.0   1748    1936

Percentage of the requests served within a certain time (ms)
  50%   1748
  66%   1752
  75%   1759
  80%   1763
  90%   1770
  95%   1775
  98%   1781
  99%   1783
 100%   1936 (longest request)

The key takeaways from the output show the it around 26 seconds to complete all the requests. The mean is about 1.7 requests per second and more then 50% of requests took longer then 1.7 seconds. It's not bad (in my case, the file zip was about 600MB), but let's see if there is something we could improve.

Profiling

We will run the same test, but this time with the application in profiling mode. To do this, modify the scripts section in your package.json file and add the following line:

"start:profiling": "tsc && node --prof dist/index-profiling.js"

Here, index-profiling.js is the script we want to profile, and the --prof flag enables V8 profiling, generating a profiling log file.

Start the application and run the benchmark again. Once the test completes, close the application. This will generate a log file named something like isolate-0xnnnnnnnnnnnn-v8.log, which may be hard to read. To convert this file into a more readable format, run:

node --prof-process isolate-0xnnnnnnnnnnnn-v8.log > profiling_result.txt

Open profiling_result.txt in your preferred text editor and scroll to the [Summary] section. You’ll see a breakdown similar to this:

 [Summary]:
   ticks  total  nonlib   name
     56    2.8%    3.0%  JavaScript
   1834   93.3%   96.9%  C++
    819   41.7%   43.3%  GC
     74    3.8%          Shared libraries
      2    0.1%          Unaccounted

As shown, most of the time was spent in the C++ section, with a significant portion in Garbage Collection (GC) as well.

We will focus on the C++ section, which might be related to the Node.js module we are using. Please scroll down to the [Bottom up (heavy) profile] section to see the most resource-intensive operations. There is already something that should catch our attention:

   ticks parent  name
    867   44.1%  t std::__1::__shared_ptr_emplace<node::ThreadsafeCopyOnWrite<std::__1::map...., 
    304   35.1%    Function: ^readSync node:fs:701:18
    304  100.0%      Function: ^tryReadSync node:fs:440:21
    304  100.0%        Function: ^readFileSync node:fs:461:22
    295   97.0%          Function: ^<anonymous> /Users/marcinmalinowski/Projects/Priv/event-loop/dist/index-profiling.js:10:21
    295  100.0%            Function: ^handle /Users/marcinmalinowski/Projects/Priv/event-loop/node_modules/express/lib/router/layer.js:86:49
      9    3.0%          LazyCompile: ~<anonymous> /Users/marcinmalinowski/Projects/Priv/event-loop/dist/index-profiling.js:10:21

As we might have expected, we used the synchronous version of the readFile function, which blocks the event loop and degrades performance. Since we found the problem in our code, we can try to fix that.

Solution

In order to fix that we could see if there is an async version of a readFile functionality we could use. After checking the documentation, we could change the code to utilize the async version:

app.get("/content", (req, res) => {
  const file = (req.query.file as string) || "";
  console.log(`Reading file: ${file}`);
  fs.readFile("file.zip", (_, data) => {
    res.send({
      message: "File read",
      size: data.length,
    });
  });
});

We can run the server again and launch benchmarking with ab tool. After a while we should get the results:

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Finished 300 requests


Server Software:        
Server Hostname:        localhost
Server Port:            3001

Document Path:          /content?file=file.zip
Document Length:        40 bytes

Concurrency Level:      20
Time taken for tests:   14.850 seconds
Complete requests:      300
Failed requests:        0
Keep-Alive requests:    300
Total transferred:      82500 bytes
HTML transferred:       12000 bytes
Requests per second:    20.20 [#/sec] (mean)
Time per request:       989.978 [ms] (mean)
Time per request:       49.499 [ms] (mean, across all concurrent requests)
Transfer rate:          5.43 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.4      0       2
Processing:   108  980 561.2    816    3385
Waiting:      107  980 561.2    815    3385
Total:        108  980 561.6    816    3387

Percentage of the requests served within a certain time (ms)
  50%    816
  66%    831
  75%    899
  80%    936
  90%   1183
  95%   2921
  98%   2982
  99%   3318
 100%   3387 (longest request)

As you can see, this time we achieved nearly a 50% improvement (the test took 15 seconds to run, compared to 26 seconds previously).

Summary

Although our program was simple, we could easily identify potential issues based on function names. The goal here was to demonstrate how to use built-in Node.js tools to profile CPU usage and optimize the application. I hope you found it interesting! In the next post, we'll focus on memory management, so stay tuned!