Why are Weakmaps in Javascript not Enumerable?
Recently a colleague of mine noticed this in the MDN description of WeakMap:
Because of references being weak, WeakMap keys are not enumerable (i.e. there is no method giving you a list of the keys). If they were, the list would depend on the state of garbage collection, introducing non-determinism. If you want to have a list of keys, you should use a Map.
My co-worker wondered what this meant exactly? Where does the non-determinism come from? Doesn’t the state of WeakMap still depend on garbage collection even if you can’t enumerate the keys?
To understand this, perhaps it’s best to examine a hypothetical JavaScript where
the keys of WeakMap
are enumerable. This would let you do some interesting
things:
function waitForGc(obj, callback) {
const map = new WeakMap();
map.set(obj, 1);
obj = undefined;
function test() {
const keys = Array.from(map.keys());
if(keys.length > 0) {
setTimeout(test, 1000);
} else {
callback();
}
}
}
You could call this function, pass it an object and a callback, and (approximately) when that object was garbage collected, this would call your callback to let you know.
The key insight here is that our hypothetical map.keys()
returns something
different depending on the state of garbage-collection. If obj
has been
garbage collected, it returns an empty set, whereas if it has not been,
map.keys()
will return an iterator with one object in it.
Now, you might be thinking “Can’t I do this anyways? Can’t I just call
weakMap.get(obj)
to check to see if obj
has been garbage collected?” The
answer is no; if you had a reference to obj
to pass into weakMap.get()
, then
clearly obj
has not been garbage collected, because you have a (strong)
reference to it. There’s a WeakMap
in JavaScript, but no WeakReference
like
in Java. So the behavior of weakMap.get()
is deterministic, and doesn’t rely
on the state of what has or hasn’t been garbage collected (at least, from your
viewpoint as the programmer).
Note that being able to detect changes in garbage-collection state might allow some interesting side-channel attacks, which was brought up as one of the reasons not to allow it in the original proposal:
A key property of Weak Maps is the inability to enumerate their keys. This is necessary to prevent attackers observing the internal behavior of other systems in the environment which share weakly-mapped objects. Should the number or names of items in the collection be discoverable from the API, even if the values aren’t, WeakMap instances might create a side channel where one was previously not available.
Java WeakReference, SoftReference, and PhantomReference
There’s a sort of similar and related problem in Java regarding garbage collection
of objects referenced by WeakReference
s.
In Java, in addition to WeakMap
we have a WeakReference
- this is like a
pointer to an object that “doesn’t count” when the garbage collector is trying
to figure out which objects are eligible for garbage collection:
User user = new User('Jason');
WeakReference weak = new WeakReference<User>(user);
user = null;
weak.get(); // returns the user, or null if the user has been GCd.
Java also has a SoftReference
, which is just like a WeakReference
, except
the VM will try to keep the object in memory as long as it can, which makes it
useful for building caches (especially, if you’re not careful, caches that fill
all of memory with useless unused objects). Also, SoftReference
has been around
since Java 1.2, but even though it was documented as trying to keep objects
in memory longer, in Java 1.4 and lower it was implemnted exactly the same as
WeakReference
. YMMV depending on the VM you are using.
Finally, Java has a PhantomReference
, which is just like a WeakReference
or
a SoftReference
, except you’re not allowed to .get()
the object that it
references. That’s right - it’s like a pointer you’re not allowed to
dereference. That sounds totally insane and useless, until you learn about
ReferenceQueue
s.
ReferenceQueues
When you create a WeakReference
, you can optionally add it to a ReferenceQueue
by passing such a queue in the constructor. You can then queue.remove()
which
will block until a reference is ready to be garbage collected, and return that
reference to you. (Or, queue.poll()
, which returns a reference if one is ready
or null if none are, if you don’t want to dedicate a whole thread to watching
references.)
This is exactly the mechanism you’d use to implement a WeakMap
in Java; every
time someone adds an object to your map, you’d create a WeakReference
, and
then use the WeakReference
as a key in a regular Map
. But, you need to clean
up that key in your Map
when the object is ready to be GCed (since otherwise
you’d have a Map
with a ton of WeakReference
keys for GCed objects, and
associated values you would never access again, and you’d just slowly eat up
all of memory). To do this, you’d add the WeakReference to a ReferenceQueue,
and then when the object is about to be GCed, Java would effectively notify you
about it and you could remove the WeakReference from your Map.
But, the ReferenceQueue
presents a unique problem in Java; When the object is
ready to be GCed, the ReferenceQueue is going to pass you back that WeakReference.
At that point, you can say “Haha Java! Not today!” and call
array.push(weakReference.get())
to create a new strong reference to the object,
which makes it no longer eligible for GC.
This sucks for Java. If you create a WeakReference
to an object, Java needs to
figure out the object is ready for GC, then before it GCs it, it needs to call
any reference queues, and then it needs to figure out if the object is ready for
GC again. This “two-stage” GC is expensive. But PhantomReference
gets
around this; because you can’t .get()
the object, Java doesn’t need to do the
two-stage GC, so it’s cheaper (although of more limited use).
So, at the end of the day, I suppose the JavaScript folks looked at all that mess in Java and said, “Nope, we’re not doing any of that.”