1. Production Problem — A Transaction Counter That Silently Drops Updates
A payment reconciliation service tracked daily transaction counts per merchant. Each incoming payment incremented a shared counter. The logic looked correct — a simple counter++ inside the request handler. Unit tests passed. Integration tests passed. The service ran without errors for weeks.
During a quarterly audit, the reconciliation numbers did not match. The service had recorded fewer transactions than the payment gateway reported. The gap was small — roughly 0.3% — but consistent across every merchant.
Thread dumps showed no deadlocks. Logs showed no exceptions. The counter variable itself was not corrupted. The problem only appeared under sustained concurrent load, where multiple threads processed payments simultaneously.
The root cause was a compound operation. The counter++ expression reads the current value, adds one, and writes the result back. These three steps do not execute atomically. When two threads read the same value before either writes, one increment overwrites the other. The lost update is silent. No exception, no log entry, no observable failure — just a number that is slightly wrong.
Most concurrency bugs do not appear as errors. They appear as data that is quietly, consistently incorrect — and they only surface under the load conditions that development environments never replicate.
2. Internal Working — How the JVM Executes Threads and Manages Memory Visibility
Every Java thread maps to an operating system thread. The JVM delegates scheduling to the OS kernel, which decides when each thread runs, for how long, and on which CPU core. The application has no control over this scheduling. Two threads executing the same code path may interleave their operations in any order the scheduler chooses.
The Java Language Specification, Chapter 17 defines the Java Memory Model (JMM). The JMM governs how threads observe each other’s writes to shared memory. Without explicit synchronization, a write by one thread has no guarantee of becoming visible to another thread.
This happens because modern CPUs do not read and write main memory directly. Each core maintains local caches. When a thread writes a value, the update may remain in the local cache indefinitely. Another thread running on a different core reads its own cache and sees a stale value.
Happens-Before and Visibility Guarantees
The JMM introduces the concept of happens-before relationships. A happens-before relationship guarantees that memory writes by one thread become visible to reads by another. Several actions create happens-before edges: releasing a monitor lock happens-before a subsequent acquisition of that lock, writing a volatile variable happens-before a subsequent read of that variable, and Thread.start() happens-before any action in the started thread.
Without a happens-before relationship, the JVM and the hardware are free to reorder and cache operations in ways that break assumptions about shared state. A common example is a shutdown flag:
// Java 8+ — visibility failure without volatile
public class WorkerService {
// Without volatile, the worker thread may never see this change
private boolean running = true;
public void stop() {
running = false; // Write stays in the calling thread's CPU cache
}
public void run() {
while (running) {
// Worker reads its own cached copy of 'running' — always true
processNextItem();
}
}
}The stop() method sets the flag to false, but the worker thread may loop indefinitely because it never sees the updated value. Declaring running as volatile forces every write to flush to main memory and every read to bypass the cache. This establishes the happens-before edge that makes the update visible.
The JMM does not guarantee that threads see each other’s changes. It guarantees that they see each other’s changes only when the program explicitly establishes visibility through synchronization, volatile access, or other happens-before mechanisms.
3. Code Example — A Shared Counter That Fails Under Concurrent Access
// Java 8+ — demonstrates a race condition in concurrent counter updates
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class TransactionCounter {
private static final Logger log = LoggerFactory.getLogger(TransactionCounter.class);
private long unsafeCount = 0;
private final AtomicLong safeCount = new AtomicLong(0);
public void incrementUnsafe() {
// Three separate operations: read, add, write — not atomic
unsafeCount++;
}
public void incrementSafe() {
// Single CAS operation at the hardware level — atomic
safeCount.incrementAndGet();
}
public static void main(String[] args) throws InterruptedException {
TransactionCounter counter = new TransactionCounter();
int threads = 8;
int incrementsPerThread = 100_000;
ExecutorService executor = Executors.newFixedThreadPool(threads);
for (int i = 0; i < threads; i++) {
executor.submit(() -> {
for (int j = 0; j < incrementsPerThread; j++) {
counter.incrementUnsafe();
counter.incrementSafe();
}
});
}
executor.shutdown();
executor.awaitTermination(30, TimeUnit.SECONDS);
long expected = (long) threads * incrementsPerThread;
log.info("Expected: {}", expected);
log.info("Unsafe count: {} (lost: {})", counter.unsafeCount, expected - counter.unsafeCount);
log.info("Safe count: {} (lost: {})", counter.safeCount.get(), expected - counter.safeCount.get());
}
}
The expected total is 800,000. The AtomicLong counter reaches it every time. The long counter consistently falls short — typically by thousands of increments, varying with each run.
The root cause is not a logic error. The code correctly expresses the intent. The problem is that unsafeCount++ compiles to three bytecode instructions: getfield, ladd, and putfield. Between the read and the write, another thread can read the same value. Both threads write back the same incremented value, and one update disappears.
AtomicLong avoids this by using a Compare-And-Swap (CAS) instruction at the hardware level. The OpenJDK AtomicLong implementation delegates to Unsafe.compareAndSwapLong, which reads, compares, and conditionally writes in one atomic CPU instruction. If another thread modified the value between the read and the swap, the CAS fails and retries.
4. What Can Go Wrong — Failure Modes in Concurrent Systems
Lost Updates and Silent Data Drift
The lost update demonstrated above is the failure mode most teams encounter first. Any read-modify-write sequence on shared state without synchronization drops updates under concurrent access. In production, counters drift, balances end up slightly wrong, and metrics undercount. The data looks plausible. That plausibility is what makes the bug survive for weeks or months before anyone notices.
Visibility Failures From CPU Caching
A different category of failure involves no data corruption at all — just stale reads. One thread writes a flag or a configuration value. Another thread never sees the update because the write stays in the writer’s CPU cache. The reading thread continues operating on old data indefinitely. In production, this creates symptoms that look unrelated to concurrency: a shutdown flag that a worker thread ignores, a feature toggle that activates on some instances but not others, or a health check that reports stale status.
Deadlocks and Thread Pool Exhaustion
When the system uses locks, a new failure pattern becomes possible. Two threads each hold a lock the other needs. Neither can proceed.
// Java 8+ — classic deadlock from inconsistent lock ordering
private final Object lockA = new Object();
private final Object lockB = new Object();
// Thread 1 acquires lockA, then waits for lockB
public void transferOut() {
synchronized (lockA) {
synchronized (lockB) { executeTransfer(); }
}
}
// Thread 2 acquires lockB, then waits for lockA — deadlock
public void transferIn() {
synchronized (lockB) {
synchronized (lockA) { executeTransfer(); }
}
}
The application does not crash — the threads simply stop making progress. Thread pools gradually fill with blocked threads. Throughput drops to zero for affected code paths while unrelated operations continue normally. Without thread dump analysis using jstack or JDK Flight Recorder, the cause remains invisible. The fix is to acquire locks in a consistent global order — both methods lock lockA first, then lockB.
Thread Starvation From Lock Contention
Even without deadlocks, locks create problems when hold times grow. A thread that holds a lock during a database call or a network request keeps every other thread waiting for the entire I/O duration. The waiting threads accumulate in a blocked state. The system has available CPU capacity, but work cannot proceed because threads cannot acquire the resource they need. In production, this appears as latency spikes that correlate with lock contention rather than CPU or I/O saturation.
Concurrency failures share a common trait: the system does not report them. No exception, no error log, no alert. The symptoms are indirect — incorrect data, stalled threads, inconsistent latency — and they surface only under load patterns that development environments never produce.
5. Performance and Scalability — How Synchronization Behaves Under Load
Synchronization has a cost. Every synchronized block requires acquiring and releasing a monitor lock. Under low contention — one or two threads — the JVM optimises this aggressively through biased locking and thin locks. The overhead is negligible.
Under high contention, the cost changes dramatically. When multiple threads compete for the same lock, the JVM escalates to a heavyweight monitor backed by an OS mutex. Threads that fail to acquire the lock enter a blocked state and must be rescheduled by the OS when the lock becomes available. Each context switch adds microseconds of latency.
Measuring this requires profiling under realistic concurrency. JDK Flight Recorder captures JavaMonitorWait and JavaMonitorEnter events, showing which locks threads contend on, for how long, and how many threads compete. async-profiler with the lock event type reveals contention hotspots that CPU profiles miss entirely.
Alternatives to synchronized
ReentrantLock from java.util.concurrent.locks provides more control than synchronized. It supports tryLock() with a timeout, which allows a thread to abandon the acquisition rather than blocking indefinitely.
// Java 8+ — bounded lock acquisition prevents indefinite blocking
private final ReentrantLock poolLock = new ReentrantLock();
public Connection acquire() throws TimeoutException {
try {
// Wait at most 500ms — fail fast rather than stall the request
if (!poolLock.tryLock(500, TimeUnit.MILLISECONDS)) {
throw new TimeoutException("Connection pool lock timeout");
}
try {
return pool.remove();
} finally {
poolLock.unlock(); // Always release in finally — no leak
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new TimeoutException("Interrupted while acquiring pool lock");
}
}
It also supports fair ordering, where the longest-waiting thread acquires the lock next. Fair locks reduce starvation but increase throughput variance.
Atomic variables (AtomicLong, AtomicReference) avoid locking entirely by using CAS. Under low to moderate contention, CAS outperforms locking because it never blocks a thread. Under extreme contention — hundreds of threads hitting the same atomic variable — CAS throughput degrades because most attempts fail and retry. LongAdder addresses this by striping updates across multiple cells, reducing contention at the cost of slightly more expensive reads.
Synchronization that works at 10 concurrent threads may fail at 1,000. The performance characteristics of every concurrency primitive change under load, and the only way to know the behaviour is to measure it under production-like conditions.
6. Trade-offs — Choosing the Right Concurrency Mechanism
Every concurrency mechanism trades one property for another. The choice depends less on which tool is “best” and more on what the system can afford to give up.
synchronized is the simplest option and the right starting point for most critical sections. The JVM handles lock release automatically, even when exceptions occur. The trade-off is control — synchronized cannot be interrupted, cannot time out, and does not support fair ordering. When a thread enters a synchronized block and the lock is held, it waits indefinitely. For short critical sections under low contention, that trade-off is invisible. Under high contention or long hold times, it becomes the bottleneck.
ReentrantLock exists for the cases where synchronized cannot adapt. Timeout-based acquisition, interruptibility, and fairness give the application control over lock behaviour. That control comes with responsibility — the developer must call unlock() explicitly, always inside a finally block. A missed unlock() causes a permanent lock leak that silently stalls every thread that subsequently tries to acquire that lock.
Atomic variables occupy a different part of the spectrum entirely. They avoid locking by using hardware-level CAS instructions, which means no thread ever blocks. For single-variable operations like counters and flags, atomics outperform locking under low to moderate contention. They cannot, however, coordinate updates to multiple variables. A counter that must increment a total and update a per-category breakdown simultaneously needs a lock — atomics cannot make that operation atomic.
The option that outperforms all of these is avoiding shared mutable state entirely. Stateless handlers, immutable objects, and thread-local storage remove the need for synchronization by removing the condition that creates contention. This approach has zero synchronization cost because there is nothing to synchronize.
7. When NOT to Use Shared Mutable State
The most reliable concurrency strategy is designing it away.
Most request handlers in modern Java services are already stateless. A Spring Boot controller that reads from a database, computes a response, and returns it has no shared state between threads. The servlet container manages the thread pool, and the handler needs no synchronization. The concurrency problem does not exist because there is nothing to share.
Data objects that move between threads do not need to be mutable either. A record in Java 16+ is immutable by default. Once created, no thread can modify its fields. Passing records between threads eliminates visibility and atomicity concerns entirely — there is no stale read when nothing changes.
// Java 16+ — immutable data transfer between threads requires no synchronization
public record PaymentEvent(String merchantId, long amount, Instant timestamp) {}
// Any thread can read any field — no locks, no volatile, no risk
// To "update", create a new instance — the original remains unchanged
PaymentEvent updated = new PaymentEvent(event.merchantId(), newAmount, Instant.now());
Systems built around message queues and event streams achieve the same isolation at the architecture level. Each consumer processes its own copy of the data. Kafka assigns partitions to individual consumer threads, ensuring that no two threads operate on the same data simultaneously. The concurrency boundary moves from the code to the infrastructure.
Introducing shared mutable state into a system that does not require it creates a maintenance burden that grows with scale. Every lock, every atomic variable, and every synchronized block becomes a future debugging surface. When the system slows down under load, each synchronization point becomes a suspect.
The safest concurrency code is the code that never needs to synchronize. Before adding a lock, examine whether the design can eliminate the sharing.
8. Real-World Use Case — Connection Pool Contention Under Peak Traffic
An API gateway used a shared connection pool to manage outbound HTTP connections to downstream services. The pool tracked available connections using a synchronized list. Each request handler acquired a connection, made the downstream call, and returned the connection to the pool.
Under moderate traffic, the system performed well. During peak hours, latency increased significantly. CPU usage remained low. Thread dumps showed most request-handling threads blocked on the synchronized list — waiting to acquire or return a connection.
The connection pool itself was not exhausted. The bottleneck was the lock protecting the pool’s internal bookkeeping. Every acquire and every release required exclusive access to the list. Under hundreds of concurrent requests, threads spent more time waiting for the lock than waiting for downstream responses.
The resolution replaced the synchronized list with a ConcurrentLinkedQueue for connection tracking and an AtomicInteger for the pool size counter.
// Java 8+ — lock-free pool eliminates the single-lock bottleneck
private final ConcurrentLinkedQueue<Connection> available = new ConcurrentLinkedQueue<>();
private final AtomicInteger activeCount = new AtomicInteger(0);
public Connection acquire() {
Connection conn = available.poll(); // Non-blocking — no lock required
if (conn != null) {
activeCount.incrementAndGet();
return conn;
}
return createNewConnection(); // Fallback when pool is empty
}
public void release(Connection conn) {
activeCount.decrementAndGet();
available.offer(conn); // Non-blocking return
}
This eliminated the single-lock bottleneck. Threads could acquire and return connections concurrently without blocking each other. Latency under peak traffic dropped to near-baseline levels.
9. Production Interview Questions
Lost Updates in Concurrent Counters
What happens if multiple threads increment a shared long counter without synchronization?
The root cause is a non-atomic compound operation. The counter++ expression compiles to three separate bytecode instructions — read, add, write. Internally, two threads can read the same value before either writes. Both threads write back the same incremented value, and one update disappears. In production, this causes counters, balances, and metrics to drift from their correct values by a small but consistent margin. The fix is to use AtomicLong.incrementAndGet(), which executes the entire read-modify-write as a single CAS instruction, or to protect the update with a synchronized block.
Visibility Failures Across Threads
How does the system behave when one thread writes a boolean flag and another thread reads it without synchronization?
The root cause is a missing happens-before relationship. The writing thread updates the flag, but the value may remain in the CPU cache of the core executing that thread. Internally, the reading thread on a different core accesses its own cache and sees the old value. In production, this causes worker threads to ignore shutdown signals, configuration updates to apply inconsistently, or status checks to return stale results. The fix is to declare the flag as volatile, which forces every write to flush to main memory and every read to bypass the cache, establishing the required visibility guarantee.
Deadlock Between Service Locks
What happens if two threads acquire locks on shared resources in opposite order?
The root cause is circular lock dependency. Thread A holds Lock 1 and waits for Lock 2. Thread B holds Lock 2 and waits for Lock 1. Internally, both threads enter a permanent blocked state because neither can release its held lock without first acquiring the other. In production, this freezes the affected code paths. Thread pools gradually fill with blocked threads, and throughput for those operations drops to zero while unrelated operations continue normally. The fix is to enforce a global lock ordering — all code paths acquire locks in the same sequence — or to use ReentrantLock.tryLock() with a timeout to detect and recover from potential deadlocks.
Lock Contention Under High Throughput
What issues arise when a critical section protected by synchronized handles thousands of concurrent requests?
The root cause is lock escalation under contention. At low concurrency, the JVM uses lightweight biased locks with minimal overhead. Internally, when many threads compete for the same monitor, the JVM escalates to a heavyweight OS-level mutex. Each thread that fails to acquire the lock enters a blocked state, requiring an OS context switch to resume. In production, this creates latency spikes that correlate with concurrency rather than CPU or I/O load. Thread dumps show large numbers of threads in BLOCKED state on the same monitor. The fix depends on the contention profile — replacing synchronized with ReentrantLock and tryLock() for bounded waiting, switching to atomic variables for single-value operations, or redesigning to reduce the critical section scope.
Thread Starvation From Long-Held Locks
How does the system behave when a synchronized block contains a blocking I/O call?
The root cause is lock hold time exceeding the operation’s tolerance. The thread holding the lock blocks on a database query or network call, keeping the lock occupied for the entire I/O duration. Internally, every other thread needing that lock enters a blocked state and remains there until the I/O completes and the lock releases. In production, this appears as cascading latency — one slow downstream call stalls every thread that shares the lock, even when the system has idle CPU capacity. The fix is to move the I/O operation outside the synchronized block so the lock only protects the state mutation, not the external call. If the full operation must remain atomic, ReentrantLock with a timeout prevents indefinite blocking.
10. Summary
Concurrency bugs do not announce themselves. A lost update produces a counter that is almost correct. A visibility failure produces a flag that is usually current. A deadlock produces threads that silently stop progressing. Every failure mode is quiet, intermittent, and load-dependent.
The Java Memory Model defines the rules, but those rules are permissive. Without explicit happens-before relationships, threads operate on private views of memory that may never converge. Every shared mutable variable that lacks synchronization, a volatile declaration, or an atomic wrapper is a potential source of silent data corruption.
The most important concurrency principle is not knowing how to synchronize. It is knowing when shared mutable state can be eliminated entirely. Synchronization controls the damage of sharing. Not sharing prevents the damage from existing.
From Real Experience
On a high-throughput reconciliation platform, a daily batch job compared transaction counts between the payment gateway and the internal ledger. The numbers never matched exactly. The gap was small — a few hundred transactions out of millions — and varied between runs. The team attributed it to timing differences between systems.
After three months, an audit flagged the drift as a pattern. The gap correlated with daily peak traffic hours. The investigation focused on the counter that tracked processed transactions. The counter was a plain long field, incremented inside a request handler that ran across a pool of 16 threads.
The diagnostic step that confirmed the issue was replacing the long with an AtomicLong in a staging environment and replaying one day of production traffic. The AtomicLong count matched the gateway count exactly. The original long counter showed the same ~0.3% drift.
The fix took one line of code. Finding the root cause took three months, because the symptoms were subtle and the counter appeared to work correctly during every test that ran at low concurrency. The development environment used two threads. Production used sixteen.
The broader lesson: concurrency correctness cannot be verified by testing. Tests run under controlled conditions with predictable scheduling. Production runs under contention with arbitrary interleaving. The only reliable concurrency guarantee is the one enforced by the memory model — through synchronization, volatile access, or atomic operations. Anything else is a bet that the scheduler will be kind, and at scale, the scheduler is never kind.




