5 Commands Every DevOps Engineer Must Know (That Aren’t Obvious)

Every DevOps engineer has a moment where a service is failing and they spend 20 minutes running the wrong tool. The five commands below are the ones most engineers either never learned, learned wrong, or forgot they had. Each one has saved hours of real debugging time — not in theory, but on a 2 a.m. Slack thread with an SRE team staring at a dashboard.

1 `ss -tlnp` — See what's actually listening and who owns it

The problem: You deploy a service and it doesn’t respond on the expected port. Or a port is already in use and you can’t figure out which process is holding it. Most engineers instinctively type netstat -tulnp — and on many modern Linux distributions, netstat is either not installed by default or returns stale kernel data. ss (socket statistics) reads directly from the kernel and is consistently faster and more accurate.

ss -tlnp

-t — TCP sockets only
-l — listening sockets only
-n — numeric addresses (no DNS resolution, faster output)
-p — show the process name and PID

Sample output:

State   Recv-Q  Send-Q  Local Address:Port  Peer Address:Port  Process
LISTEN  0       128     0.0.0.0:22           0.0.0.0:*          users:(("sshd",pid=987,fd=3))
LISTEN  0       511     0.0.0.0:80           0.0.0.0:*          users:(("nginx",pid=1234,fd=6))
LISTEN  0       128     127.0.0.1:5432       0.0.0.0:*          users:(("postgres",pid=2200,fd=5))

You can immediately see that Postgres is only bound to localhost (port 5432) while nginx is listening on all interfaces (port 80). If your app can’t reach the database from another host, ss -tlnp shows you why in one command — no config file digging needed.

2 `journalctl -u nginx --since "1 hour ago" --no-pager` — Time-scoped service logs

The problem: An error appeared in the last hour, but /var/log/nginx/error.log is 400 MB and tail -f shows you nothing. Most engineers either grep a log file with an imprecise timestamp pattern, or scroll through an endless journalctl stream without knowing you can slice it by time.

journalctl -u nginx --since "1 hour ago" --no-pager

-u nginx — filter to the nginx systemd unit only
--since "1 hour ago" — natural-language timestamps work here: "2026-04-24 14:00:00", "yesterday", "2 hours ago"
--no-pager — pipe-safe output (essential for scripts and SSH sessions)

You can combine --since with --until to isolate a 10-minute window around an outage. Add -p err to show only error-level entries. Add -f to follow in real time. The journalctl query engine is significantly faster than grepping a plaintext log file of the same size because the journal is stored in a structured binary format with indexes.

Most engineers still manually grep log files in 2026. Engineers who know journalctl slice logs in seconds.

3 `curl -w "@curl-format.txt" -o /dev/null -s https://example.com` — HTTP timing breakdown

The problem: A service responds, but slowly. You need to know where the time is going: is it DNS resolution? The TLS handshake? Time to first byte from the backend? Browser DevTools won’t help from a server. A plain curl gives you the response body but no timing detail.

First, create the format file (do this once and keep it in your home directory):

# curl-format.txt
     time_namelookup:  %{time_namelookup}s\n
        time_connect:  %{time_connect}s\n
     time_appconnect:  %{time_appconnect}s\n
    time_pretransfer:  %{time_pretransfer}s\n
       time_redirect:  %{time_redirect}s\n
  time_starttransfer:  %{time_starttransfer}s\n
                     ----------\n
          time_total:  %{time_total}s\n

Then run:

curl -w "@curl-format.txt" -o /dev/null -s https://example.com

Sample output:

     time_namelookup:  0.004s
        time_connect:  0.021s
     time_appconnect:  0.078s
    time_pretransfer:  0.078s
       time_redirect:  0.000s
  time_starttransfer:  0.312s
                     ----------
          time_total:  0.313s

Here, time_appconnect - time_connect = 0.057s: the TLS handshake cost 57 ms. time_starttransfer - time_appconnect = 0.234s: time to first byte from the server was 234 ms. That’s where the latency lives — not in the network, but in the application. You just saved a 30-minute argument about whether the problem is the load balancer or the backend.

4 `openssl s_client -connect host:443 -servername host 2>/dev/null | openssl x509 -noout -dates` — Check TLS cert expiry from CLI

The problem: A certificate is about to expire (or already has). You’re SSH’d into a server and can’t open a browser. You want to check the expiry of the cert a remote server is actually serving — not what’s in a file on disk, but what the live TLS handshake returns.

openssl s_client -connect certquests.com:443 -servername certquests.com \
  2>/dev/null | openssl x509 -noout -dates

Sample output:

notBefore=Jan  1 00:00:00 2026 GMT
notAfter=Apr  1 00:00:00 2027 GMT

The -servername flag is critical: it sends the SNI (Server Name Indication) extension, which tells the server which certificate to return when multiple domains share the same IP. Without it you may get the wrong cert entirely on a shared hosting setup. The 2>/dev/null suppresses the noisy handshake output so you only see the cert dates.

You can wrap this in a quick one-liner to check multiple hosts in a loop, or feed it into a monitoring script that fires an alert 30 days before expiry — something surprisingly few teams bother to set up until after the first outage.

5 `strace -p PID -e trace=network -f` — Trace network syscalls of a running process

The problem: An application is running, logs show nothing useful, and it simply cannot reach a remote host. Firewall rules look correct. DNS resolves. But the connection is failing. You need to see exactly what system calls the process is making at the kernel level to understand whether it’s even trying to connect, and to where.

strace -p 4321 -e trace=network -f

-p 4321 — attach to the running process with PID 4321 (use ps aux | grep myapp to find it)
-e trace=network — filter to network-related syscalls only (connect, bind, sendto, recvfrom, etc.)
-f — follow child processes and threads (essential for multi-threaded apps)

Sample output:

[pid 4321] connect(5, {sa_family=AF_INET, sin_port=htons(5432),
    sin_addr=inet_addr("10.0.1.45")}, 16) = -1 ECONNREFUSED (Connection refused)

That single line tells you everything: the process is attempting to connect to 10.0.1.45:5432, and the connection is being refused. The address 10.0.1.45 might be wrong (a stale config), the port might be correct, and the firewall is not the problem — the remote service is not listening. You get this in under 10 seconds. Without strace, this diagnosis typically takes 20–30 minutes of configuration archaeology.

A word of caution: strace adds overhead to the traced process. Don’t run it on a heavily loaded production service without understanding the impact. Use it for short bursts to capture the syscall you’re looking for, then detach with Ctrl+C.

Why these commands matter for your certification

If you’re working toward CompTIA Linux+ or RHCSA, these commands appear directly in exam objectives around system management, service troubleshooting, and network configuration. For CCNA, the networking concepts behind ss output (binding, port states, protocol behaviour) and TLS certificate validation are embedded in the network fundamentals domain.

More practically: the ability to diagnose a broken service fast is exactly what separates a junior engineer who opens a ticket from a senior engineer who resolves it before the team even notices. These five commands are the difference.

Key takeaways

Replace netstat with ss -tlnp immediately — it’s faster, always available, and shows process ownership.
Use journalctl --since / --until instead of grepping log files; it’s indexed and time-aware.
The curl -w timing format turns a black-box HTTP request into a detailed latency waterfall.
Always pass -servername to openssl s_client on SNI-enabled hosts, or you may check the wrong certificate.
When an app can’t connect and nothing else helps, strace -e trace=network shows you the exact address and port the process is actually trying to reach.

1 ss -tlnp — See what's actually listening and who owns it

2 journalctl -u nginx --since "1 hour ago" --no-pager — Time-scoped service logs

3 curl -w "@curl-format.txt" -o /dev/null -s https://example.com — HTTP timing breakdown

4 openssl s_client -connect host:443 -servername host 2>/dev/null | openssl x509 -noout -dates — Check TLS cert expiry from CLI

5 strace -p PID -e trace=network -f — Trace network syscalls of a running process

Why these commands matter for your certification

Key takeaways

1 `ss -tlnp` — See what's actually listening and who owns it

2 `journalctl -u nginx --since "1 hour ago" --no-pager` — Time-scoped service logs

3 `curl -w "@curl-format.txt" -o /dev/null -s https://example.com` — HTTP timing breakdown

4 `openssl s_client -connect host:443 -servername host 2>/dev/null | openssl x509 -noout -dates` — Check TLS cert expiry from CLI

5 `strace -p PID -e trace=network -f` — Trace network syscalls of a running process