The Problem
This is fairly fundamental, but easy enough to forget if you are not used to overriding the equals method in Java. Recite this mantra every morning as you grind your coffee beans: When you override the equals method, override hashCode too.
Equals who? Hash code wha? Let's see the code.
Let's say you have a bag o' marbles. Marbles are pretty simple, they have a color. We instantiate a few and throw them into the bag (implemented here as a Set):
import java.util.HashSet;
import java.util.Set;public class Marble {
private String color;
public Marble(String color) { this.color = color; }
public String getColor() { return color; }
public static void main(String[] args) {
Set<Marble> bagOMarbles = new HashSet<Marble>();
bagOMarbles.add(new Marble("red"));
bagOMarbles.add(new Marble("red"));
bagOMarbles.add(new Marble("blue"));
bagOMarbles.add(new Marble("blue"));
System.out.println( bagOMarbles.size()
+ " marbles in the bag." );
}
}
Compile and run. How many marbles are in the bag? If you don't understand why there are 4, stop here until you've got it. Got it? 4 marbles in the bag, 2 are red, 2 are blue. But let's say we have a magic bag that only allows us to carry one of any given color. No matter how many red marbles you put in this magic bag, there will be no more than one red marble when you reach back into the bag. Same with any other color of marble. How do you implement that magic bag? "Easy," you say, "the Set only allows unique objects to be contained, so we define equality of marbles to be based on color." Sounds good to me. And so we override the equals method:
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj != null && getClass() == obj.getClass()) {
Marble otherMarble = (Marble)obj;
if (color.equals(otherMarble.getColor())) return true;
}
return false;
}
Compile and run. How many marbles are in the bag? If you're thinking 2, then recite the mantra, take a sip of coffee and read on. The answer is 4.
Why is it 4? Why does one red marble not equal another? Why are the two blue marbles not considered to be the same? I've clearly defined them to be equal. In the mantra is the answer: When you override the equals method, override hashCode too.
What is hashCode?
Simply put, hashCode provides a means for organizing your objects into groups, or hash buckets. You will generally not use the hash code of an object directly, but Java will use it in order to more efficiently do an equality check. The check goes something like this: Are the hash codes the same? If yes: check the equals method. If no, don't bother checking the equals method since objects with differing hash codes are assumed not to be equal. That last part in bold is the assumption that Java makes when checking for equality when adding an object to a Set. And therein lies the problem: it is an assumption that may not be true since nothing, not even the compiler is enforcing this rule. It is your job as a programmer to be sure this assumption is correct, so once again, When you override the equals method, override hashCode too.
So what happens inside our magic marble bag? When we throw a marble into the bag, something like this occurs:
- Are there any marbles in this bag with the same hash code as this new marble?
- Nope -- okay, add the marble.
Note: equals is never even checked! Only when the hash code is the same will equals be called. If you're not buying it, put a print statement in the equals method to convince yourself. The default implementation of hashCode will yield different hash codes for each object because it is based on the object ID, not on the object properties, and as it stands, the equals method will never get called when we throw a new marble into the magic marble bag.
Try This
So, then, remember the mantra and add the following method override:
// not a good, but still valid, hash code
public int hashCode() { return 37; }
This hashes all marbles into the same bucket. Note, this is not a good hash code, but a valid hash code nevertheless, and one which is consistent with our definition of equals. Any integer value here would have the same effect. I will save writing better hash codes for a future post -- right now just understand that even this inefficient single-bucket hash code is better than nothing, since the default hash code is no longer contractually valid once we override equals.
So what happens when we add, for example, our second red marble?
- Are there any marbles in here with hash code 37? Yeah - what'd you think?
- Ok, so then are there any marbles here that are red? Yeah - go away - no more room for red marbles.
And after adding 2 red marbles and 2 blue marbles, we have our expected Set total of 1 red + 1 blue.
For more efficient comparisons, be sure to look into better hash code overrides. But even if efficiency is not your concern, be sure to provide some hash code other than the default whenever you implement your own equals.