How are Scala collections able to return the correct collection type from a map operation?

Question

Welcome To Ask or Share your Answers For Others

How are Scala collections able to return the correct collection type from a map operation?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

How are Scala collections able to return the correct collection type from a map operation?

Note: This is an FAQ, asked specifically so I can answer it myself, as this issue seems to come up fairly often and I want to put it in a location where it can (hopefully) be easily found via a search

As prompted by a comment on my answer here

For example:

"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]

Looking in the scaladoc, all of these use the map operation inherited from TraversableLike, so how come it's always able to return the most specific valid collection? Even String, which provides map via an implicit conversion.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:56:12+0000

Scala collections are clever things...

Internals of the collection library is one of the more advanced topics in the land of Scala. It involves higher-kinded types, inference, variance, implicits, and the CanBuildFrom mechanism - all to make it incredibly generic, easy to use, and powerful from a user-facing perspective. Understanding it from the point-of-view of an API designer is not a light-hearted task to be taken on by a beginner.

On the other hand, it's incredibly rare that you'll ever actually need to work with collections at this depth.

So let us begin...

With the release of Scala 2.8, the collection library was completely rewritten to remove duplication, a great many methods were moved to just one place so that ongoing maintenance and the addition of new collection methods would be far easier, but it also makes the hierarchy harder to understand.

Take List for example, this inherits from (in turn)

LinearSeqOptimised
GenericTraversableTemplate
LinearSeq
Seq
SeqLike
Iterable
IterableLike
Traversable
TraversableLike
TraversableOnce

That's quite a handful! So why this deep hierarchy? Ignoring the XxxLike traits briefly, each tier in that hierarchy adds a little bit of functionality, or provides a more optimised version of inherited functionality (for example, fetching an element by index on a Traversable requires a combination of drop and head operations, grossly inefficient on an indexed sequence). Where possible, all functionality is pushed as far up the hierarchy as it can possibly go, maximising the number of subclasses that can use it and removing duplication.

map is just one such example. The method is implemented in TraversableLike (Though the XxxLike traits only really exist for library designers, so it's generally considered to be a method on Traversable for most intents and purposes - I'll come to that part shortly), and is widely inherited. It's possible to define an optimised version in some subclass, but it must still conform to the same signature. Consider the following uses of map (as also mentioned in the question):

"abcde" map {_.toUpperCase} //returns a String
"abcde" map {_.toInt} // returns an IndexedSeq[Int]
BitSet(1,2,3,4) map {2*} // returns a BitSet
BitSet(1,2,3,4) map {_.toString} // returns a Set[String]

In each case, the output is of the same type as the input wherever possible. When it's not possible, superclasses of the input type are checked until one is found that does offer a valid return type. Getting this right took a lot of work, especially when you consider that String isn't even a collection, it's just implicitly convertible to one.

So how is it done?

One half of the puzzle is the XxxLike traits (I did say I'd get to them...), whose main function is to take a Repr type param (short for "Representation") so that they'll know the true subclass actually being operated on. So e.g. TraversableLike is the same as Traversable, but abstracted over the Repr type param. This param is then used by the second half of the puzzle; the CanBuildFrom type class that captures source collection type, target element type and target collection type to be used by collection-transforming operations.

It's easier to explain with an example!

BitSet defines an implicit instance of CanBuildFrom like this:

implicit def canBuildFrom: CanBuildFrom[BitSet, Int, BitSet] = bitsetCanBuildFrom

When compiling BitSet(1,2,3,4) map {2*}, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, Int, T]

This is the clever part... There's only one implicit in scope that matches the first two type parameters. The first parameter is Repr, as captured by the XxxLike trait, and the second is the element type, as captured by the current collection trait (e.g. Traversable). The map operation is then also parameterised with a type, this type T is inferred based on the third type parameter to the CanBuildFrom instance that was implicitly located. BitSet in this case.

So the first two type parameters to CanBuildFrom are inputs, to be used for implicit lookup, and the third parameter is an output, to be used for inference.

CanBuildFrom in BitSet therefore matches the two types BitSet and Int, so the lookup will succeed, and inferred return type will also be BitSet.

When compiling BitSet(1,2,3,4) map {_.toString}, the compiler will attempt an implicit lookup of CanBuildFrom[BitSet, String, T]. This will fail for the implicit in BitSet, so the compiler will next try its superclass - Set - This contains the implicit:

implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Set[A]] = setCanBuildFrom[A]

Which matches, because Coll is a type alias that's initialised to be BitSet when BitSet derives from Set. The A will match anything, as canBuildFrom is parameterised with the type A, in this case it's inferred to be String... Thus yielding a return type of Set[String].

So to correctly implement a collection type, you not only need to provide a correct implicit of type CanBuildFrom, but you also need to ensure that the concrete type of that of that collection is supplied as the Repr param to the correct parent traits (for example, this would be MapLike in the case of subclassing Map).

String is a little more complicated as it provides map by an implicit conversion. The implicit conversion is to StringOps, which subclasses StringLike[String], which ultimately derives TraversableLike[Char,String] - String being the Repr type param.

There's also a CanBuildFrom[String,Char,String] in scope so that the compiler knows that when mapping the elements of a String to Chars, then the return type should also be a string. From this point onwards, the same mechanism is used.

Categories

How are Scala collections able to return the correct collection type from a map operation?

How are Scala collections able to return the correct collection type from a map operation?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags