502 Bad Gateway in WordPress

A 502 Bad Gateway means the proxy in front of your WordPress site got an invalid response (or no response at all) from the application server, usually because a PHP-FPM worker crashed mid-request. This article explains the exact mechanism, the five real causes, how to tell them apart in the logs, and how to fix each one.

Your WordPress site shows a blank page with the text "502 Bad Gateway", or Chrome reports "This page isn't working: HTTP ERROR 502". The response inspector shows status code 502. The error appears for visitors and for you, in every browser, on mobile and on desktop. Sometimes it clears on a reload, sometimes it sticks until you intervene.

What a 502 actually means

RFC 9110 §15.6.3 defines 502 as the status the gateway sends when it received an invalid response from an inbound server it accessed while trying to fulfill the request. The key word is invalid. Not slow. Not "no answer eventually". The upstream sent something back, and that something was either truncated, malformed, an empty header block, or the connection was closed mid-sentence. In a WordPress stack the "gateway" is nginx (or Apache with mod_proxy_fcgi) and the upstream is PHP-FPM. The web server passed your request to PHP, PHP started writing a response, and then something went wrong before PHP finished. The web server gave up and translated the broken upstream into a 502 for the visitor.

A 502 is not the same as the other 5xx errors in this category, and the difference matters for your diagnosis:

502 Bad Gateway: the upstream returned an invalid or truncated response (or none at all). Usually the worker crashed or the socket is wrong. This article.
504 Gateway Timeout: the upstream was reachable but took too long to answer. PHP is still alive, just slow.
503 Service Unavailable: the upstream explicitly signaled it cannot accept the request right now (overload, maintenance mode, rate limit).
500 Internal Server Error: the application itself errored cleanly and returned a 500 to the proxy.

In short: a 502 is a corpse, a 504 is silence, a 503 is a "go away", and a 500 is an admission of guilt.

Common causes, ordered by likelihood

1. A PHP-FPM child crashed mid-request

This is the cause behind most 502s on a busy site. PHP-FPM runs a pool of worker processes; one of those workers picked up your request, started running WordPress code, and then died before finishing the response. The death is almost always a segfault, an out-of-memory kill from the Linux kernel, or a fatal PHP error inside an extension that can't be caught from PHP land. nginx sees the connection close before the headers are complete, logs upstream prematurely closed connection while reading response header from upstream, and serves a 502. PHP-FPM has emergency_restart_threshold and emergency_restart_interval precisely for this scenario: if too many children exit with SIGSEGV or SIGBUS in a short window, the master itself restarts.

A regular PHP fatal (the kind that produces "There has been a critical error on this website") usually does not produce a 502, because PHP catches it and returns a clean 500-with-body. A 502 means PHP could not even finish writing the response, which is a stronger signal: native code crashed, the OOM killer fired, or the worker was killed externally.

2. The upstream socket path is wrong

nginx talks to PHP-FPM over either a TCP port (127.0.0.1:9000) or a Unix socket (unix:/run/php/php8.3-fpm.sock). If the path in fastcgi_pass does not match the listen directive in the FPM pool config, nginx cannot connect at all, and every request returns 502. This is the most common cause right after a PHP version upgrade: the old socket was at /run/php/php8.2-fpm.sock, the new pool listens on php8.3-fpm.sock, and nginx still points at the old path.

3. PHP-FPM is not running

The pool is down. Maybe a config error stopped it from booting after a restart. Maybe the OOM killer keeps killing the master itself. Maybe a deploy script disabled it and forgot to start it again. nginx tries to connect, fails immediately, logs connect() failed (111: Connection refused) while connecting to upstream, and returns 502. This is the easiest cause to confirm and the easiest to fix.

4. A reverse proxy in front of nginx cannot reach the origin

If you have an extra proxy tier (a load balancer, a separate cache layer, an ingress controller) and nginx is the origin, then the proxy uses proxy_pass and the same logic applies one layer up. The proxy fails to reach the origin IP, the origin port is filtered by a firewall rule, or the upstream block in the proxy config points to a stale address. From the visitor's side it looks identical to the local FPM case, but the failure is at the network layer between the two tiers.

5. Cloudflare (or another edge) cannot reach the origin

If your traffic flows through Cloudflare and the edge can't reach your origin, Cloudflare returns its own 502 page. Cloudflare's documentation distinguishes Cloudflare-branded errors (the edge generated the response itself) from unbranded ones (the origin sent a 502 that Cloudflare passed through). Common edge-side triggers include the origin sending gzip-compressed content with a wrong Content-Length header, an origin that only partially supports HTTP/2, and a Cloudflare Tunnel where cloudflared cannot reach the upstream service. A firewall rule that blocks Cloudflare's IP ranges produces this too.

Diagnose which cause applies

These checks are non-destructive. Run them in order before you change a single config value.

Check 1: read the nginx error log. This is the single most important step. On a typical Linux host the log lives at /var/log/nginx/error.log (or in your hosting control panel under "error logs"). For a 502 the log line itself tells you which cause you have:

upstream prematurely closed connection while reading response header from upstream

That is cause #1: a worker started writing a response and died. The smoking gun.

connect() failed (111: Connection refused) while connecting to upstream

That is cause #3: PHP-FPM is not listening on that socket at all. Either it is down, or its listen path differs from fastcgi_pass.

connect() to unix:/run/php/php8.3-fpm.sock failed (2: No such file or directory)

That is cause #2: the socket path in your nginx config does not match where FPM is actually listening. The 2: No such file or directory part is the giveaway.

You will know it worked when: you can quote the exact log line for the failing request and you can match its timestamp to a visitor report.

Check 2: look for the kill in dmesg and the FPM error log. If cause #1 looks plausible, cross-check with the kernel and FPM logs. Run dmesg -T | grep -i 'killed process' on the server. If the OOM killer fired, you will see a line naming php-fpm with a memory total. That confirms the worker was killed by the kernel for memory pressure, not by PHP itself. The PHP-FPM error log (/var/log/php8.3-fpm.log on most setups) will also contain WARNING: [pool www] child 12345 exited on signal 11 (SIGSEGV) or similar. If emergency_restart_threshold is enabled and the FPM master itself restarted, you will see failed processes threshold ... reached, initiating reload in the same log.

You will know it worked when: you can name the exact signal (SIGSEGV, SIGKILL, SIGBUS) or confirm the OOM killer fired with a timestamp matching the 502.

Check 3: confirm PHP-FPM is up and listening on the right path. Run systemctl status php8.3-fpm to see if the service is alive. Then check what it is listening on with ss -lnp | grep php-fpm (TCP) or ls -la /run/php/ (Unix socket). Compare that to the fastcgi_pass line in your nginx site config. Mismatches catch causes #2 and #3 in one shot.

You will know it worked when: the FPM service shows active (running), its listen path appears in ss or as a real file in /run/php/, and that exact path matches what nginx is configured to use.

Check 4: bypass the edge. If you use Cloudflare or a similar edge proxy, hit the origin IP directly (using curl --resolve yoursite.nl:443:1.2.3.4 https://yoursite.nl/) and see what comes back. If the origin returns a clean 200 and only the edge URL returns 502, the failure is at the edge layer: cause #5. If the origin also returns 502, the failure is at your stack and you are back in causes #1 to #3.

You will know it worked when: you can state with certainty whether the origin direct request succeeds or fails, and you have the response body of both attempts.

Solutions, per cause

Cause #1 fix: a worker crashed mid-request

The worker died for a reason, and that reason is in either the OOM log or the FPM error log. Two paths:

If the OOM killer fired, the worker was using more memory than the system could give it. The right fix is to find which request is allocating that much (often a large image-processing call, a runaway plugin, or an export), reduce its memory footprint, or move it out of the request lifecycle. The wrong fix is to raise PHP's memory_limit until the system OOMs more aggressively. If the host has too little RAM for the configured pm.max_children, lower pm.max_children so that a full pool fits in memory. See the PHP workers article for the structural calculation.

If the worker segfaulted, the failure is in native code: PHP itself, or more often a PHP extension (imagick, gd, redis, xdebug in production). The FPM error log will name the pool and the PID. From there you want the kernel core dump if you have core_pattern set, otherwise reproduce the request with xdebug disabled and one extension at a time disabled until the segfault stops. Update PHP to the latest patch release in your branch and update the offending extension. Enable emergency_restart_threshold = 10 and emergency_restart_interval = 1m so that a runaway crash loop triggers an automatic FPM master restart instead of leaving the site in a broken state.

Verification: the same URL that returned 502 now returns a 200 in the access log, the FPM error log no longer contains new exited on signal lines, and dmesg shows no further OOM kills for php-fpm.

Cause #2 fix: wrong socket path

Open your FPM pool config (typically /etc/php/8.3/fpm/pool.d/www.conf) and read the listen line. Open your nginx site config and read the fastcgi_pass line. Make them match exactly. After a PHP version upgrade, both files often need to move from php8.2-fpm.sock to php8.3-fpm.sock. Reload nginx with nginx -t && systemctl reload nginx and restart FPM with systemctl restart php8.3-fpm.

Verification: curl -I https://yoursite.nl/ returns a 200 from the origin, and the nginx error log no longer contains No such file or directory lines for the socket path.

Cause #3 fix: PHP-FPM is down

Start it with systemctl start php8.3-fpm and check the result with systemctl status php8.3-fpm. If it refuses to start, the status output will name the failing config file. The most common boot failures are a duplicate listen directive across pools, a user value that does not exist, or an invalid pm value. Fix the config and try again. If FPM starts but immediately dies, check journalctl -u php8.3-fpm -n 100 for the exit reason and dmesg for an OOM kill on the master process.

Verification: systemctl status php8.3-fpm shows active (running), the uptime is older than your most recent 502, and the site responds with a 200.

Cause #4 fix: a proxy tier cannot reach the origin

Check the upstream block in the proxy config. The IP or hostname must resolve, the port must be open from the proxy host, and any firewall rule between the two must allow traffic from the proxy's source IP. From the proxy host, run curl -v http://<origin-ip>:<port>/ to confirm reachability. If the connection hangs, the firewall is blocking. If it refuses, the origin nginx is not listening on that interface. If the upstream block uses a DNS name that recently changed IPs, restart the proxy so that it re-resolves, or configure DNS-based health checks.

Verification: curl -v from the proxy host to the origin succeeds, and the visitor URL returns a 200 through the proxy.

Cause #5 fix: Cloudflare cannot reach the origin

Start with the Cloudflare-specific causes: check your origin's Content-Length header against the actual gzipped body if you do origin compression, disable HTTP/2 to the origin if your backend speaks it badly, and verify your origin firewall allows Cloudflare's IP ranges. For Cloudflare Tunnel setups, check cloudflared logs for "unable to reach the origin service" and confirm the tunnel target points at a service that is actually running. If only some paths return 502 through the edge while others succeed, it is almost always a per-path origin issue (a long endpoint that briefly crashes a worker, an upload path that exceeds Cloudflare's request body limit) rather than an edge configuration problem.

Verification: the same URL that 502'd through the edge now returns a 200 through the edge, and the origin access log shows a matching 200 with no anomalies.

When to escalate

If the steps above do not pinpoint the cause within 30 minutes, hand the incident off to your host or developer. Have these ready, because the first thing they will ask for is exactly this list:

The exact URL or admin action that triggers the 502.
The time of the error, with timezone, and whether it is reproducible or only sporadic.
Your hosting tier and stack (shared, VPS, managed, container; nginx or Apache; PHP version).
The matching line from the nginx error log (cause #1 vs #2 vs #3 vs #4).
The output of dmesg -T | grep -i killed from around the incident.
The matching WARNING: [pool www] child ... exited on signal line from the PHP-FPM error log if one exists.
Whether pm.max_children times the average worker RSS exceeds available RAM.
Whether the site is behind Cloudflare or another edge proxy, and whether the 502 also happens when you bypass it.
The list of plugins active on the site, especially any that load native extensions or any that were updated in the last 48 hours.

Send those in the first message. It saves a full round trip and routes the ticket straight to the right engineer.

How to prevent it from coming back

A persistent 502 is almost always memory pressure or a buggy native extension. Three things keep the error rare on a healthy site:

Size pm.max_children to fit in RAM at full load. Take the average worker RSS during peak (not idle), multiply by pm.max_children, add a 25% headroom buffer, and the result must fit inside the host RAM minus the database, the OS, and the cache. If it does not, you will get OOM kills, and OOM kills become 502s. Lower pm.max_children until the math works.
Enable emergency_restart_threshold. Set it to 10 over 1m. If a single buggy extension starts crashing workers in a loop, the master restarts instead of serving a wall of 502s. This is a fail-safe, not an excuse to ignore the underlying crash.
Stay current on PHP and extensions. Most native crashes I've seen on managed WordPress hosting are old versions of imagick or gd against a newer libc, or xdebug accidentally left enabled in production. Update the patch release of PHP regularly, and never run xdebug on a production pool.

If a single request to your site can complete cleanly and the FPM worker that handled it returns to the pool intact, then a 502 should be impossible during normal operation. Anything else is the system telling you that a worker is dying somewhere, and the 502 is just where that death surfaces.

Want this to stop being your problem?

If outages or errors keep repeating, the fix is often consistency: updates, backups and monitoring that don't get skipped.

Compare maintenance plans