N+1 Query problem Spring JPA
The N+1 query problem occurs when executing a query triggers additional queries to fetch related data from the database. This issue can cause significant performance degradation, especially in time-sensitive applications, as the number of records and the complexity of table relationships increase.
In this article, we will explore the N+1 query problem in detail and discuss effective solutions to address it.
Problem
The N+1 query problem occurs when an application executes a single query to retrieve a collection of records (N) from the database, and then for each record, additional queries are executed to fetch related data from another table. This results in a total of N+1 queries, where the number of queries increases proportionally with the size of the dataset. The larger the value of N, the greater the performance impact.
This issue is common in ORM (Object-Relational Mapping) frameworks, which serve as an abstraction layer between relational databases and programming languages. While ORMs simplify database interactions by automatically generating queries, they can introduce significant overhead when dealing with complex database schemas and relationships.
Example: Understanding the N+1 Query Problem
Consider the following entity classes representing a University
, its Students
, and their Phones
:
@Entity
@Table(name = "university")
@AllArgsConstructor
public class UniversityEntity {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private int id;
private String name;
@OneToMany(cascade = CascadeType.ALL)
@JoinColumn(name = "uni_id")
private Set<StudentEntity> students;
}
@Entity
@Table(name = "student")
public class StudentEntity {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private int id;
private String name;
@OneToMany(cascade = CascadeType.ALL)
@JoinColumn(name = "student_id")
private Set<PhoneEntity> phones;
}
@Entity
@Table(name = "phone")
public class PhoneEntity {
@Id
String number;
String type;
}
If you query all universities using the findByIdIn
method and set jpa.show-sql=true
, the SQL logs for a dataset with 3 universities, each having 2 students, and one student owning 2 phones, would look like this:
Hibernate: select ue1_0.id,ue1_0.name from uni ue1_0 where ue1_0.id in (?,?,?)
Hibernate: select s1_0.uni_id,s1_0.id,s1_0.name from student s1_0 where s1_0.uni_id=?
Hibernate: select s1_0.uni_id,s1_0.id,s1_0.name from student s1_0 where s1_0.uni_id=?
Hibernate: select s1_0.uni_id,s1_0.id,s1_0.name from student s1_0 where s1_0.uni_id=?
Hibernate: select p1_0.student_id,p1_0.number,p1_0.type from phone p1_0 where p1_0.student_id=?
Hibernate: select p1_0.student_id,p1_0.number,p1_0.type from phone p1_0 where p1_0.student_id=?
Hibernate: select p1_0.student_id,p1_0.number,p1_0.type from phone p1_0 where p1_0.student_id=?
Hibernate: select p1_0.student_id,p1_0.number,p1_0.type from phone p1_0 where p1_0.student_id=?
Hibernate: select p1_0.student_id,p1_0.number,p1_0.type from phone p1_0 where p1_0.student_id=?
Hibernate: select p1_0.student_id,p1_0.number,p1_0.type from phone p1_0 where p1_0.student_id=?
Here’s what happens:
- The first query retrieves all universities by ID.
- For each university, additional queries fetch the associated students.
- For each student, additional queries fetch their phones.
With this dataset, the number of queries multiplies based on the relationships: 1 query for universities + 3 queries for students (resulting in 6 students) + 6 queries for phones = 10 total queries.
This multiplication of queries can lead to significant performance issues, especially in performance-sensitive applications with larger datasets or more complex relationships.
Solution: Using JPA EntityGraph to Solve the N+1 Query Problem
There are multiple ways to address the N+1 query problem, but in my project, I used JPA EntityGraph. EntityGraph allows you to specify which entities should be fetched from the database using SQL joins, ensuring related data is fetched eagerly. This eliminates the need for additional queries and significantly improves performance.
Steps to Implement EntityGraph
- Declare the EntityGraph in the Entity Class
Use the@NamedEntityGraph
annotation to define the graph structure. Specify the relationships to be fetched eagerly using@NamedAttributeNode
and@NamedSubgraph
.
@NamedEntityGraph(
name = "graph.UniversityData",
attributeNodes = @NamedAttributeNode(value = "students", subgraph = "subgraph.studentData"),
subgraphs = {
@NamedSubgraph(
name = "subgraph.studentData",
attributeNodes = @NamedAttributeNode("phones")
)
}
)
public class UniversityEntity {
// Other fields and annotations
}
2. Use the Declared EntityGraph in the Repository Method
Annotate the query method in the repository with @EntityGraph
and specify the name of the defined graph.
@EntityGraph("graph.UniversityData")
List<UniversityEntity> findByIdIn(List<Integer> ids);
Optimized Query Output
By using EntityGraph, the same query described earlier is reduced to a single optimized SQL query
Hibernate: select ue1_0.id, ue1_0.name,
s1_0.uni_id, s1_0.id, s1_0.name,
p1_0.student_id, p1_0.number, p1_0.type
from uni ue1_0
left join student s1_0 on ue1_0.id = s1_0.uni_id
left join phone p1_0 on s1_0.id = p1_0.student_id
where ue1_0.id in (?,?,?)
With this approach, the N+1 queries are consolidated into a single query by eagerly fetching the related datasets using the defined EntityGraph. This drastically reduces the database load and improves performance for applications with complex relationships and large datasets.
Thus, the N+1 query problem is effectively resolved by leveraging EntityGraph to eagerly fetch related datasets in a single, optimised query.