JPA/Hibernate: Pragmatic Data Access That Scales
JPA/Hibernate: Pragmatic Data Access That Scales
If your dev team is like mine, Java is the opinionated back-end language. We’ve recently refactored how the teams access data to become more consistent and use ORM. Below are some lessons learned.
JPA/Hibernate through Spring gives you productive data access. It maps objects to tables, tracks changes, and writes SQL. This saves time. It also hides costs that show up in production.
Engineers need a clear posture: use ORM for domain consistency, and use SQL when the query is the product. Don’t guess. Make fetch plans explicit, measure the SQL, and encode constraints into code and reviews.
The value (and the cost) of ORM
ORM gives you some legitimately useful things. Unit of Work tracks changes across your transaction. The identity map keeps entities unique. You get declarative mapping for relations and cascading. JPQL and entity graphs mean you’re not locked into one database vendor.
But the costs aren’t obvious until they hurt you:
- The I/O patterns are invisible in your code. You think you’re calling a getter, but you just fired three SQL queries.
- Large persistence contexts eat memory. We had one service that would OOM under load because of this.
- Flush timing, mapping config, and database execution plans interact in weird ways.
- Fetch strategy and caching both claim to be abstractions, but they leak constantly.
The tradeoffs nobody tells you about
Fetch strategy: the defaults will bite you
In JPA, ManyToOne and OneToOne are EAGER by default. OneToMany and ManyToMany are LAZY. This seems reasonable until you realize what it means. EAGER on a big collection? You just loaded the entire table. LAZY without thinking it through? Get ready for a hundred extra queries.
Here’s the classic mistake with eager collections:
@Entity
class Customer {
@OneToMany(mappedBy = "customer", fetch = FetchType.EAGER) // costly in prod
private List<Order> orders;
}
You query one Customer to display their name. Hibernate loads every single Order they ever made. Use LAZY on collections and be explicit about what you actually need via projections or fetch graphs.
The N+1 problem
This one is famous for a reason. Loop over a collection without planning your fetches, and you get N+1 queries.
// N+1: one query for orders, then one per order for items
List<Order> orders = em.createQuery("select o from Order o", Order.class).getResultList();
for (Order o : orders) {
int count = o.getItems().size(); // LAZY loads items per order
}
What actually works:
- Fetch join one collection at a time, and only when you know the result set is small and you’re deduplicating.
- DTO projections for anything read-heavy.
- Batch size hints can group secondary queries, which helps.
Don’t try to fetch join multiple collections. You’ll get a cartesian explosion, and some JPA providers just refuse to do it. Be selective, not greedy.
LazyInitializationException and detached graphs
Accessing a LAZY association after the transaction closes fails.
Order order = orderService.findById(id); // returns detached entity
order.getItems().size(); // throws LazyInitializationException
Don’t use “Open Session in View.” Seriously, just don’t. Instead, load a DTO that has exactly what you need while you’re still in the transaction. Or use an entity graph to specify up front which attributes you need. Keep your transactions tight to the service layer that’s actually building the response.
Cascades and equality
CascadeType.ALL sounds convenient, but it’s dangerous. I’ve seen it delete entire object graphs by accident. Only cascade within an aggregate boundary, where you actually control the lifecycle.
And please, define equals/hashCode on business keys that don’t change, not on database IDs. If you use generated IDs before persistence, entities won’t match in sets and Hibernate’s change tracking gets confused.
Flush behavior and bulk updates
Flush timing matters more than you’d think. If Hibernate flushes before every query (which it can do), your read-heavy code gets slow for no good reason. Keep transactions short and only flush explicitly when you need to.
Bulk updates are weird because they skip the persistence context entirely:
int updated = em.createQuery(
"update Order o set o.status = :s where o.status = :old")
.setParameter("s", Status.SHIPPED)
.setParameter("old", Status.PAID)
.executeUpdate();
// Persistence context is stale; clear or refresh if you continue
em.clear();
After any bulk operation, clear the persistence context or do the bulk work in isolation. Otherwise your cached entities are stale.
When to just write SQL
If the database work is the actual feature, write SQL directly. Don’t try to bend JPA to do it.
Native queries make sense for:
- Analytics (window functions, CTEs, rollups)
- Database-specific features (Postgres arrays, JSONB, PostGIS)
- Locking strategies JPA can’t express
- Read paths where you need precise projections and indexes
- Pagination over gnarly joins
- Maintenance jobs like backfills
Example: read-only projection with a window function.
public interface OrderTotal {
Long getOrderId();
BigDecimal getCustomerTotal();
}
@Query(value = """
select o.id as orderId,
sum(i.price) over (partition by o.customer_id) as customerTotal
from orders o
join order_items i on i.order_id = o.id
where o.created_at >= :since
""", nativeQuery = true)
List<OrderTotal> findTotalsSince(@Param("since") Instant since);
For native reads, return DTOs or interfaces, never managed entities. Keep your writes in the domain model where they belong, and keep reads simple.
Patterns that actually help
Map your aggregates, not your entire schema. Unidirectional relations are easier to reason about. Bi-directional webs get out of control fast.
Be explicit about fetch plans. Use JPQL with fetch joins for one collection or a few to-one relations. Entity graphs work when you want reusable, named fetch shapes. DTO projections are best for read models.
// DTO projection via JPQL, avoids entity graph explosion
List<OrderView> views = em.createQuery(
"select new com.acme.api.OrderView(o.id, c.name, sum(i.price)) " +
"from Order o join o.customer c join o.items i " +
"where o.createdAt >= :since " +
"group by o.id, c.name", OrderView.class)
.setParameter("since", since)
.getResultList();
Enable JDBC batching for inserts and updates. Paginate anything big. Don’t load entire associations with SELECT.
Keep transactions short. Mark read-only transactions so Hibernate skips flush overhead. Use optimistic locking when you have invariants to protect.
The second-level cache is fine for reference data (countries, product categories). Don’t cache high-volume, frequently changing entities. You’ll spend more time invalidating cache than you save.
Log the actual SQL in dev and staging. In production, sample it. Look at execution plans for your slowest queries. Watch connection pool saturation, row counts, rows read per request. You can’t fix what you don’t measure.
How to decide
Use ORM when you’re modeling a domain with real behavior and invariants. When writes are the main thing and Unit of Work helps. When queries stay simple and stay within aggregate boundaries.
Write SQL when you need database features ORM can’t handle, when you need predictable query plans and tight payloads, or when you’re building reports, feeds, or analytics.
Most real systems do both. Keep your domain writes in ORM where you can enforce invariants. Build read models with SQL and projections. Wrap the SQL behind repositories with tests.
What to do next
Pick three endpoints this week. Trace the SQL they generate. Count how many rows get read. Sketch out the fetch plan you actually want, then implement it. Fix one N+1 with a projection or an entity graph. If you find a query where the database work is the feature, rewrite it as native SQL.
Write down your team’s rules. Put them in code reviews. Data access isn’t something you solve once. It’s something you keep making better.
Get posts like this in your inbox
Bi-weekly emails on automation, AI, and building systems that run without you. No fluff.
No spam. Unsubscribe anytime.
Related Articles
Intentional AI Integration: How to Adopt AI Coding Tools Without Wrecking Your Codebase
April 1, 2026
AI tools made your team faster. Then patterns started drifting. Here's how to keep architectural coherence without killing the productivity gains.
Lessons Learned in 2026
March 5, 2026
A generalist's field notes from 2026. AI changed the game, execution still wins, and the engineers who connect dots nobody else sees are the ones getting ahead.
AI Code Review: Approaches, Trends, and Best Practices
February 27, 2026
AI is writing more code. Here's how to review it faster — local agent patterns, CI/CD integration, and the vendor landscape including Greptile, CodeRabbit, and GitHub Copilot.
Wrestling with a technical challenge?
I help companies automate complex workflows, integrate AI into their stacks, and build scalable cloud architectures.