Understanding Node.js - Profiling
Introduction
This is the third and final article in our series on the architecture of Node.js, focusing specifically on threading and CPU management. Our upcoming articles will shift to topics related to memory management and profiling.
In this post, we’ll explore Node.js's built-in capabilities for profiling and identifying performance bottlenecks. These tools help detect code inefficiencies that may slow down the application and degrade overall performance.
Problem
Please take a look at the following code (it returns the size of a requested file passed on as a query parameter).
import express from "express";
import { randomBytes } from "node:crypto";
const app = express();
const port = 3000;
app.get("/content", (req, res) => {
const file = req.query.file as string;
console.log(`Reading file: ${file}`);
const content = fs.readFileSync(file);
res.send({
message: "File read",
length: content.length,
});
});
app.listen(port, () => {
console.log(`Example app listening on port ${port}`);
});
At first glance, it does not look as something that can cause any performance issues. So, let us try to benchmark the endpoint with ab
tool. Start the application and run following in the terminal:
ab -k -c 20 -n 300 "http://localhost:3000/content?file=file.zip"
After a while, you should see the results:
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Finished 300 requests
Server Software:
Server Hostname: localhost
Server Port: 3000
Document Path: /content?file=file.zip
Document Length: 42 bytes
Concurrency Level: 20
Time taken for tests: 26.227 seconds
Complete requests: 300
Failed requests: 0
Keep-Alive requests: 300
Total transferred: 83100 bytes
HTML transferred: 12600 bytes
Requests per second: 11.44 [#/sec] (mean)
Time per request: 1748.483 [ms] (mean)
Time per request: 87.424 [ms] (mean, across all concurrent requests)
Transfer rate: 3.09 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 2
Processing: 82 1688 266.2 1748 1933
Waiting: 82 1688 266.3 1748 1933
Total: 82 1688 266.0 1748 1936
Percentage of the requests served within a certain time (ms)
50% 1748
66% 1752
75% 1759
80% 1763
90% 1770
95% 1775
98% 1781
99% 1783
100% 1936 (longest request)
The key takeaways from the output show the it around 26 seconds
to complete all the requests. The mean is about 1.7 requests
per second and more then 50%
of requests took longer then 1.7 seconds
. It's not bad (in my case, the file zip was about 600MB), but let's see if there is something we could improve.
Profiling
We will run the same test, but this time with the application in profiling mode. To do this, modify the scripts section in your package.json
file and add the following line:
"start:profiling": "tsc && node --prof dist/index-profiling.js"
Here, index-profiling.js
is the script we want to profile, and the --prof
flag enables V8 profiling, generating a profiling log file.
Start the application and run the benchmark again. Once the test completes, close the application. This will generate a log file named something like isolate-0xnnnnnnnnnnnn-v8.log
, which may be hard to read. To convert this file into a more readable format, run:
node --prof-process isolate-0xnnnnnnnnnnnn-v8.log > profiling_result.txt
Open profiling_result.txt
in your preferred text editor and scroll to the [Summary]
section. You’ll see a breakdown similar to this:
[Summary]:
ticks total nonlib name
56 2.8% 3.0% JavaScript
1834 93.3% 96.9% C++
819 41.7% 43.3% GC
74 3.8% Shared libraries
2 0.1% Unaccounted
As shown, most of the time was spent in the C++
section, with a significant portion in Garbage Collection (GC)
as well.
We will focus on the C++ section, which might be related to the Node.js module we are using. Please scroll down to the [Bottom up (heavy) profile]
section to see the most resource-intensive operations. There is already something that should catch our attention:
ticks parent name
867 44.1% t std::__1::__shared_ptr_emplace<node::ThreadsafeCopyOnWrite<std::__1::map....,
304 35.1% Function: ^readSync node:fs:701:18
304 100.0% Function: ^tryReadSync node:fs:440:21
304 100.0% Function: ^readFileSync node:fs:461:22
295 97.0% Function: ^<anonymous> /Users/marcinmalinowski/Projects/Priv/event-loop/dist/index-profiling.js:10:21
295 100.0% Function: ^handle /Users/marcinmalinowski/Projects/Priv/event-loop/node_modules/express/lib/router/layer.js:86:49
9 3.0% LazyCompile: ~<anonymous> /Users/marcinmalinowski/Projects/Priv/event-loop/dist/index-profiling.js:10:21
As we might have expected, we used the synchronous version of the readFile
function, which blocks the event loop and degrades performance. Since we found the problem in our code, we can try to fix that.
Solution
In order to fix that we could see if there is an async version of a readFile
functionality we could use. After checking the documentation, we could change the code to utilize the async version:
app.get("/content", (req, res) => {
const file = (req.query.file as string) || "";
console.log(`Reading file: ${file}`);
fs.readFile("file.zip", (_, data) => {
res.send({
message: "File read",
size: data.length,
});
});
});
We can run the server again and launch benchmarking with ab
tool. After a while we should get the results:
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Finished 300 requests
Server Software:
Server Hostname: localhost
Server Port: 3001
Document Path: /content?file=file.zip
Document Length: 40 bytes
Concurrency Level: 20
Time taken for tests: 14.850 seconds
Complete requests: 300
Failed requests: 0
Keep-Alive requests: 300
Total transferred: 82500 bytes
HTML transferred: 12000 bytes
Requests per second: 20.20 [#/sec] (mean)
Time per request: 989.978 [ms] (mean)
Time per request: 49.499 [ms] (mean, across all concurrent requests)
Transfer rate: 5.43 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 2
Processing: 108 980 561.2 816 3385
Waiting: 107 980 561.2 815 3385
Total: 108 980 561.6 816 3387
Percentage of the requests served within a certain time (ms)
50% 816
66% 831
75% 899
80% 936
90% 1183
95% 2921
98% 2982
99% 3318
100% 3387 (longest request)
As you can see, this time we achieved nearly a 50% improvement (the test took 15 seconds to run, compared to 26 seconds previously).
Summary
Although our program was simple, we could easily identify potential issues based on function names. The goal here was to demonstrate how to use built-in Node.js tools to profile CPU usage and optimize the application. I hope you found it interesting! In the next post, we'll focus on memory management, so stay tuned!