Friday, December 24, 2010

JPA + Hibernate: optimizing queries on a many-to-many relationship

Following up a discussion (in Russian) on a developers' forum on advanced querying against many-to-many mapping in JPA I have compiled all my findings in this post. I intentionally leave all object and database names close to the original maybe with minor changes.
Problem description:
The application in question runs in a JPA environment with Hibernate EntityManager. Entity Profile has two many-to-many relationships with itself indicating - 1) profiles that are the friends of this profile, and 2) the profiles this profile is befriended by:
  @Entity
  @Table
(name = "PROFILE")
 
public class Profile implements Serializable {
   
@Id
    @GeneratedValue
(strategy = GenerationType.AUTO)
   
private Long id;
   
private String name;
   
@ManyToMany
    @JoinTable
(name = "FRIEND", joinColumns = @JoinColumn(
       
name = "from_id"), inverseJoinColumns = @JoinColumn(
       
name = "to_id"))
   
private Set<Profile> friends = new HashSet<Profile>();
   
@ManyToMany(mappedBy = "friends")
   
private Set<Profile> friendedBy = new HashSet<Profile>();
   
// ...
 
}
Despite real-world experience makes us expect friendship to be a symmetric relationship, i.e. Y is a friend of X implies X is a friend of Y, the relationship in this example is non-symmetric, expressing rather a X's sympathy to Y; seems the original system is a kind of social network implementation. From an abstract view it's a unidirectional graph where nodes correspond to profiles and every arc from X to Y indicates Y is a friend of X.
Somewhere in the application for every other profile but X we need a list of two-element tuples where the 1st element is the profile name, and the 2nd one is a boolean indicating whether that profile is a friend of X.
For example we have four profiles A, B, C, D such as A has both B and C as friends, but not D. The expected list is {{"B", true}, {"C", true},{"D", false}}.
At the same time friends of B are C and D, but not A. The expected list for B is {{"A", false}, {"C", true},{"D", true}}. Note that although B is a friend of A, the inverse is not true.
The requirement for such a list is totally unclear to me and likely to be caused by a design flaw but lets imagine some legacy modules cannot live without it.
Solution:
So the question is, given a profile XYZ, what JPA query would create a list of all other profile names combined with a boolean flag set to true for friends of XYZ? Since there are actually two relationships - friends and friendedBy we can use either of them for the query. Lets first build a query based on friends. Here is the suggested JPA Query:
SELECT y.name, CASE WHEN EXISTS ( SELECT x.id FROM Profile x WHERE x.id = :buddyId AND y MEMBER OF x.friends) THEN TRUE ELSE FALSE END FROM Profile y WHERE y.id <> :buddyId
The named parameter buddyId is value of the primary key of XYZ. This JPQL produces the following SQL when using MS SQL Server Express 2005 as the back-end database:
select profile0_.name as col_0_0_, case when exists (select profile1_.id from PROFILE profile1_ where profile1_.id=? and ( profile0_.id in ( select profile3_.id from FRIEND friends2_, PROFILE profile3_ where profile1_.id=friends2_.from_id and friends2_.to_id=profile3_.id ) ) ) then 1 else 0 end as col_1_0_ from PROFILE profile0_ where profile0_.id<>?
As can be seen there are two sub-selects in the statement which is likely to be inefficient. Lets try the inverse relationship friendedBy and see if we can end up with a simpler SQL. The JPQL in this case is less verbose:
SELECT x.name, CASE WHEN :buddyId MEMBER OF x.friendedBy THEN TRUE ELSE FALSE END FROM Profile x WHERE x.id <> :buddyId
And the SQL is too:
select profile0_.name as col_0_0_, case when ? in (select profile2_.id from FRIEND friendedby1_, PROFILE profile2_ where profile0_.id=friendedby1_.to_id and friendedby1_.from_id=profile2_.id) then 1 else 0 end as col_1_0_ from PROFILE profile0_ where profile0_.id<>?
Now it's only one sub-select promising better performance. However, joining by friendedby1_.from_id=profile2_.id in the sub-select seems to be excessive. There is no need to do an extra look-up in the PROFILE table at all. We just want to check if there is at least one row where FRIEND.from_id equals to the query parameter. Ideally the SQL should be something similar to:
SELECT p.name, EXISTS(SELECT * FROM friend f WHERE f.from_id = :buddyId AND f.to_id = p.id) FROM profile AS p WHERE p.id <> :buddyId
In fact, it can be done - by the price of having one more managed class mapped to the association table FRIEND. I think Sympathy is a good name for it as it reflects non-symmetrical nature of the relationship modeled. The purpose of this class is to expose FRIEND rows as entities for use in JPA queries. The class will not allow any modifications and its instances should never be built by the business logic. For every managed entity a primary key must be defined. I don't want to extend the table with extra column for the primary key. The pair of from_id and to_id fields already constitutes a unique combination and can be used as a composite primary key. I will use an embeddable class to represent that. Here is the declaration of the new class with a composite primary key:
@Entity
  @Table
(name = "FRIEND")
 
public class Sympathy implements Serializable {
   
@EmbeddedId
   
private Key key;
   
@ManyToOne
    @JoinColumn
(name = "from_id", insertable = false, updatable = false)
   
private Profile from;
   
@ManyToOne
    @JoinColumn
(name = "to_id", insertable = false, updatable = false)
   
private Profile to;
   
   
public Key getKey() { return key; }
   
   
public Profile getFrom() { return from; }

   
public Profile getTo() { return to; }

   
@Embeddable
   
public static class Key implements Serializable {
     
@Column(name = "from_id")
     
long fromId;
     
@Column(name = "to_id")
     
long toId;
     
     
public Key() {
       
super();
     
}

     
public long getFromId() { return fromId; }

     
public long getToId() { return toId; }

     
@Override
     
public int hashCode() {
       
final int prime = 31;
       
int result = 1;
        result = prime * result +
(int) (fromId ^ (fromId >>> 32));
        result = prime * result +
(int) (toId ^ (toId >>> 32));
       
return result;
     
}

     
@Override
     
public boolean equals(Object obj) {
       
if (this == obj)
         
return true;
       
if (obj == null)
         
return false;
       
if (getClass() != obj.getClass())
         
return false;
        Key other =
(Key) obj;
       
if (fromId != other.fromId)
         
return false;
       
if (toId != other.toId)
         
return false;
       
return true;
     
}
    }
  }
The embedded class Key must override hashCode() and equals() methods. Either side of the relationship is exposed in two ways. Firstly, I can retrieve raw primary key values by .getKey().getFromId() or .getKey.getToId() call. Secondly, I have declared two one-to-many mappings to the Profile table so that if I ever get an instance of the Sympathy class in the business logic the references to both profiles may be resolved using EntityManager built-in features.
Although the application will never modify instances of this class, there is no way to mark a class as read-only in JPA so it remains modifiable from the EntityManager perspective. To help EntityManager decide which of the two mappings it must inspect when checking for modifications on transaction commit I had to add insertable=false, updatable=false instruction to the primary key mappings. Otherwise ambiguity would exist causing a run-time error during Hibernate initialization.
Now back to the query, below is its JPQL using the new class:
SELECT x.name, CASE WHEN EXISTS ( SELECT s.key.fromId FROM Sympathy s WHERE s.key.fromId = :buddyId AND s.key.toId = x.id ) THEN TRUE ELSE FALSE END FROM Profile x WHERE x <> :buddyId
Resulting SQL is:
select profile0_.name as col_0_0_, case when exists (select sympathy1_.from_id from FRIEND sympathy1_ where sympathy1_.from_id=? and sympathy1_.to_id=profile0_.id) then 1 else 0 end as col_1_0_ from PROFILE profile0_ where profile0_.id<>?
and this is very close to the desired.
While testing all the queries above I have been using interactive features of HQL Editor provided by the Hibernate plug-in for Eclipse where I can only set parameters to primitive types. Further simplification of JPQL may be done by using an instance of Profile object as the parameter rather than its primary key. It's likely to be a run-time scenario where the instance can be looked up by a separate query.

No comments:

Post a Comment