Generics: Terminology
Parameterized language constructs
Generally speaking, when we talk about either templates or a generics system, we are talking about generalizing some language construct by adding a parameter to it. Language constructs here primarily would include functions and types, but we may want to support parameterizing other language constructs like interfaces.
This parameter broadens the scope of the language construct on an axis defined by that parameter, for example it could define a family of functions instead of a single one.
Generic versus template parameters
When we are distinguishing between generics and templates in Carbon, it is on an parameter by parameter basis. A single function can take a mix of regular, generic, and template parameters.
- Regular parameters, or "dynamic parameters", are designated using the
"<name>
:
<type>" syntax (or "<value>"). - Generic parameters are designated using
:!
between the name and the type (so it is "<name>:!
<type>"). - Template parameters are designated using "
template
<name>:!
<type>".
The syntax for generic and template parameters was decided in questions-for-leads issue #565.
Expected difference between generics and templates:
Generics | Templates |
bounded parametric polymorphism | compile-time duck typing and ad-hoc polymorphism |
constrained genericity | optional constraints |
name lookup resolved for definitions in isolation ("early") | some name lookup may require information from calls (name lookup may be "late") |
sound to typecheck definitions in isolation ("early") | complete type checking may require information from calls (may be "late") |
supports separate type checking; may also support separate compilation, for example when implemented using dynamic witness tables | separate compilation only to the extent that C++ supports it |
allowed but not required to be implemented using dynamic dispatch | does not support implementation by way of dynamic dispatch, just static by way of instantiation |
monomorphization is an optional optimization that cannot render the program invalid | monomorphization is mandatory and can fail, resulting in the program being invalid |
Polymorphism
Generics and templates provide different forms of polymorphism than object-oriented programming with inheritance. That uses subtype polymorphism where different descendants, or "subtypes", of a base class can provide different implementations of a method, subject to some compatibility restrictions on the signature.
Parametric polymorphism
Parametric polymorphism (Wikipedia) is when a function or a data type can be written generically so that it can handle values identically without depending on their type. Bounded parametric polymorphism is where the allowed types are restricted to satisfy some constraints. Within the set of allowed types, different types are treated uniformly.
Compile-time duck typing
Duck typing (Wikipedia) is when the legal types for arguments are determined implicitly by the usage of the values of those types in the body of the function. Compile-time duck typing is when the usages in the body of the function are checked at compile-time, along all code paths. Contrast this with ordinary duck typing in a dynamic language such as Python where type errors are only diagnosed at runtime when a usage is reached dynamically.
Ad-hoc polymorphism
Ad-hoc polymorphism (Wikipedia), also known as "overloading", is when a single function name has multiple implementations for handling different argument types. There is no enforcement of any consistency between the implementations. For example, the return type of each overload can be arbitrary, rather than being the result of some consistent rule being applied to the argument types.
Templates work with ad-hoc polymorphism in two ways:
- A function with template parameters can be specialized in C++ as a form of ad-hoc polymorphism.
- A function with template parameters can call overloaded functions since it will only resolve that call after the types are known.
In Carbon, we expect there to be a compile error if overloading of some name
prevents a generic function from being typechecked from its definition alone.
For example, let's say we have some overloaded function called F
that has two
overloads:
fn F[template T:! Type](x: T*) -> T;
fn F(x: Int) -> Bool;
A generic function G
can call F
with a type like T*
that can not possibly
call the F(Int)
overload for F
, and so it can consistently determine the
return type of F
. But G
can't call F
with an argument that could match
either overload.
Note: It is undecided what to do in the situation where F
is overloaded,
but the signatures are consistent and so callers could still typecheck calls to
F
. This still poses problems for the dynamic strategy for compiling generics,
in a similar way to impl specialization.
Constrained genericity
We will allow some way of specifying constraints as part of a function (or type or other parameterized language construct). These constraints are a limit on what callers are allowed to pass in. The distinction between constrained and unconstrained genericity is whether the body of the function is limited to just those operations that are guaranteed by the constraints.
With templates using unconstrained genericity, you may perform any operation in the body of the function, and they will be checked against the specific types used in calls. You can still have constraints, but they are optional in that they could be removed and the function would still have the same capabilities. Constraints only affect the caller, which will use them to resolve overloaded calls to the template and provide clearer error messages.
With generics using constrained genericity, the function body can be checked against the signature at the time of definition. Note that it is still perfectly permissible to have no constraints on a type; that just means that you can only perform operations that work for all types (such as manipulate pointers to values of that type) in the body of the function.
Dependent names
A name is said to be dependent if it depends on some generic or template parameter. Note: this matches the use of the term "dependent" in C++, not as in dependent types.
Definition checking
Definition checking is the process of semantically checking the definition of parameterized code for correctness independently of any particular arguments. It includes type checking and other semantic checks. It is possible, even with templates, to check semantics of expressions that are not dependent on any template parameter in the definition. Adding constraints to template parameters and/or switching them to be generic allows the compiler to increase how much of the definition can be checked. Any remaining checks are delayed until instantiation, which can fail.
Complete definition checking
Complete definition checking is when the definition can be fully semantically checked, including type checking. It is an especially useful property because it enables separate semantic checking of the definition, a prerequisite to separate compilation. It also enables implementation strategies that don’t instantiate the implementation (for example, type erasure or dynamic-dispatch witness tables).
Early versus late type checking
Early type checking is where expressions and statements are type checked when the definition of the function body is compiled, as part of definition checking. This occurs for regular and generic values.
Late type checking is where expressions and statements may only be fully typechecked once calling information is known. Late type checking delays complete definition checking. This occurs for template-dependent values.
Deduced parameter
An deduced parameter is listed in the optional [
]
section right after the
function name in a function signature:
fn
<name> [
<deduced parameters> ](
<explicit parameters ) ->
<return type>
Deduced arguments are determined as a result of pattern matching the explicit argument values (usually the types of those values) to the explicit parameters. Note that function signatures can typically be rewritten to avoid using deduced parameters:
fn F[template T:! Type](value: T);
// is equivalent to:
fn F(value: (template T:! Type));
See more here.
Interface
An interface is an API constraint used in a function signature to provide encapsulation. Encapsulation here means that callers of the function only need to know about the interface requirements to call the function, not anything about the implementation of the function body, and the compiler can check the function body without knowing anything more about the caller. Callers of the function provide a value that has an implementation of the API and the body of the function may then use that API (and nothing else).
Structural interfaces
A "structural" interface is one where we say a type satisfies the interface as long as it has members with a specific list of names, and for each name it must have some type or signature. A type can satisfy a structural interface without ever naming that interface, just by virtue of having members with the right form.
Nominal interfaces
A "nominal" interface is one where we say a type can only satisfy an interface
if there is some explicit statement saying so, for example by defining an
impl. This allows "satisfies the
interface" to have additional semantic meaning beyond what is directly checkable
by the compiler. For example, knowing whether the Draw
function means "render
an image to the screen" or "take a card from the top of a deck of cards"; or
that a +
operator is commutative (and not, say, string concatenation).
We use the "structural" versus "nominal" terminology as a generalization of the same terms being used in a subtyping context.
Named constraints
Named constraints are "structural" in the sense that they match a type based on meeting some criteria rather than an explicit statement in the type's definition. The criteria for a named constraint, however, are less focused on the type's API and instead might include a set of nominal interfaces that the type must implement and constraints on the associated entities and interface type parameters.
Associated entity
An associated entity is a requirement in an interface that a type's implementation of the interface must satisfy by having a matching member. A requirement that the type define a value for a member constant is called an associated constant, and similarly an associated function or associated type.
Different types can satisfy an interface with different definitions for a given member. These definitions are associated with what type is implementing the interface. An impl defines what is associated with the type for that interface.
Rust uses the term "associated item" instead of associated entity.
Impls: Implementations of interfaces
An impl is an implementation of an interface for a specific type. It is the place where the function bodies are defined, values for associated types, etc. are given. Impls are needed for nominal interfaces; structural interfaces and named constraints define conformance implicitly instead of by requiring an impl to be defined. In can still make sense to implement a named constraint as a way to implement all of the interfaces it requires.
Internal impl
A type that implements an interface internally has all the named members of the interface as named members of the type. This means that the members of the interface are available by way of both simple member access and qualified member access expressions.
External impl
In contrast, a type that implements an interface externally does not include the named members of the interface in the type. The members of the interface are still implemented by the type, though, and so may be accessed using qualified member access expressions for those members.
Member access
There are two different kinds of member access: simple and compound. See the member access design document for the details. The application to generics combines compound member access with qualified names, which we call a qualified member access expression.
Simple member access
Simple member access has the from object.member
, where member
is a word
naming a member of object
. This form may be used to access members of
interfaces implemented internally by the type of object
.
If String
implements Printable
internally, then s1.Print()
calls the
Print
method of Printable
using simple member access. In this case, the name
Print
is used without qualifying it with the name of the interface it is a
member of since it is recognized as a member of the type itself as well.
Qualified member access expression
Compound member access has the form object.(expression)
, where expression
is
resolved in the containing scope. A compound member access where the member
expression is a simple member access expression, as in a.(context.b)
, is
called a qualified member access expression. The member expression context.b
may be the qualified member name of an interface member, that consists of the
name of the interface, possibly qualified with a package or namespace name, a
dot .
and the name of the member.
For example, if the Comparable
interface has a Less
member method, then the
qualified name of that member is Comparable.Less
. So if String
implements
Comparable
, and s1
and s2
are variables of type String
, then the Less
method may be called using the qualified member name by writing the qualified
member access expression s1.(Comparable.Less)(s2)
.
This form may be used to access any member of an interface implemented for a type, whether it is implemented internally or externally.
Compatible types
Two types are compatible if they have the same notional set of values and represent those values in the same way, even if they expose different APIs. The representation of a type describes how the values of that type are represented as a sequence of bits in memory. The set of values of a type includes properties that the compiler can't directly see, such as invariants that the type maintains.
We can't just say two types are compatible based on structural reasons. Instead, we have specific constructs that create compatible types from existing types in ways that encourage preserving the programmer's intended semantics and invariants, such as implementing the API of the new type by calling (public) methods of the original API, instead of accessing any private implementation details.
Subtyping and casting
Both subtyping and casting are different names for changing the type of a value to a compatible type.
Subtyping is a relationship between two types where you can safely operate on a value of one type using a variable of another. For example, using C++'s object-oriented features, you can operate on a value of a derived class using a pointer to the base class. In most cases, you can pass a more specific type to a function that can handle a more general type. Return types work the opposite way, a function can return a more specific type to a caller prepared to handle a more general type. This determines how function signatures can change from base class to derived class, see covariance and contravariance in Wikipedia.
In a generics context, we are specifically interested in the subtyping relationships between type-of-types. In particular, a type-of-type encompasses a set of type constraints, and you can convert a type from a more-restrictive type-of-type to another type-of-type whose constraints are implied by the first. C++ concepts terminology uses the term "subsumes" to talk about this partial ordering of constraints, but we avoid that term since it is at odds with the use of the term in object-oriented subtyping terminology.
Note that subtyping is a bit like coercion, except we want to make it clear that the data representation of the value is not changing, just its type as reflected in the API available to manipulate the value.
Casting is indicated explicitly by way of some syntax in the source code. You
might use a cast to switch between type adaptations, or to
be explicit where an implicit conversion would otherwise occur. For now, we are
saying "x as y
" is the provisional syntax in Carbon for casting the value x
to the type y
. Note that outside of generics, the term "casting" includes any
explicit type change, including those that change the data representation.
In contexts where an expression of one type is provided and a different type is required, an implicit conversion is performed if it is considered safe to do so. Such an implicit conversion, if permitted, always has the same meaning as an explicit cast.
Coherence
A generics system has the implementation coherence property, or simply coherence, if there is a single answer to the question "what is the implementation of this interface for this type, if any?" independent of context, such as the libraries imported into a given file.
This is typically enforced by making sure the definition of the implementation must be imported if you import both the interface and the type. This may be done by requiring the implementation to be in the same library as the interface or type. This is called an orphan rule, meaning we don't allow an implementation that is not with either of its parents (parent type or parent interface).
Note that in addition to an orphan rule that implementations are visible when queried, coherence also requires a rule for resolving what happens if there are multiple non-orphan implementations. In Rust, this is called the overlap rule or overlap check. This could be just producing an error in that situation, or picking one using some specialization rule.
Adapting a type
A type can be adapted by creating a new type that is compatible with an existing type, but has a different API. In particular, the new type might implement different interfaces or provide different implementations of the same interfaces.
Unlike extending a type (as with C++ class inheritance), you are not allowed to add new data fields onto the end of the representation -- you may only change the API. This means that it is safe to cast a value between those two types without any dynamic checks or danger of object slicing.
This is called "newtype" in Rust, and is used for capturing additional information in types to improve type safety by moving some checking to compile time (1, 2, 3) and as a workaround for Rust's orphan rules for coherence.
Type erasure
"Type erasure" is where a type's API is replaced by a subset. Everything outside of the preserved subset is said to have been "erased". This can happen in a variety of contexts including both generics and runtime polymorphism. For generics, type erasure restricts a type to just the API required by the constraints on a generic function.
An example of type erasure in runtime polymorphism in C++ is casting from a pointer of a derived type to a pointer to an abstract base type. Only the API of the base type is available on the result, even though the implementation of those methods still come from the derived type.
The term "type erasure" can also refer to the specific strategy used by Java to implement generics. which includes erasing the identity of type parameters. This is not the meaning of "type erasure" used in Carbon.
Archetype
A placeholder type is used when type checking a function in place of a generic type parameter. This allows type checking when the specific type to be used is not known at type checking time. The type satisfies just its constraint and no more, so it acts as the most general type satisfying the interface. In this way the archetype is the supertype of all types satisfying the interface.
In addition to satisfying all the requirements of its constraint, the archetype also has the member names of its constraint. Effectively it is considered to implement the constraint internally.
Extending an interface
An interface can be extended by defining an interface that includes the full API of another interface, plus some additional API. Types implementing the extended interface should automatically be considered to have implemented the narrower interface.
Witness tables
Witness tables are an implementation strategy where values passed to a generic parameter are compiled into a table of required functionality. That table is then filled in for a given passed-in type with references to the implementation on the original type. The generic is implemented using calls into entries in the witness table, which turn into calls to the original type. This doesn't necessarily imply a runtime indirection: it may be a purely compile-time separation of concerns. However, it insists on a full abstraction boundary between the generic user of a type and the concrete implementation.
A simple way to imagine a witness table is as a struct of function pointers, one per method in the interface. However, in practice, it's more complex because it must model things like associated types and interfaces.
Witness tables are called "dictionary passing" in Haskell. Outside of generics, a vtable is a witness table that witnesses that a class is a descendant of an abstract base class, and is passed as part of the object instead of separately.
Dynamic-dispatch witness table
For dynamic-dispatch witness tables, actual function pointers are formed and used as a dynamic, runtime indirection. As a result, the generic code will not be duplicated for different witness tables.
Static-dispatch witness table
For static-dispatch witness tables, the implementation is required to collapse the table indirections at compile time. As a result, the generic code will be duplicated for different witness tables.
Static-dispatch may be implemented as a performance optimization for dynamic-dispatch that increases generated code size. The final compiled output may not retain the witness table.
Instantiation
Instantiation is the implementation strategy for templates in both C++ and Carbon. Instantiation explicitly creates a copy of the template code and replaces the template components with the concrete type and its implementation operations. It allows duck typing and lazy binding. Instantiation implies template code will be duplicated.
Unlike static-dispatch witness tables and monomorphization (as in Rust), this is done before type checking completes. Only when the template is used with a concrete type is the template fully type checked, and it type checks against the actual concrete type after substituting it into the template. This means that different instantiations may interpret the same construct in different ways, and that templates can include constructs that are not valid for some possible instantiations. However, it also means that some errors in the template implementation may not produce errors until the instantiation occurs, and other errors may only happen for some instantiations.
Specialization
Template specialization
Specialization in C++ is essentially overloading in the context of a template.
The template is overloaded to have a different definition for some subset of the
possible template argument values. For example, the C++ type std::vector<T>
might have a specialization std::vector<T*>
that is implemented in terms of
std::vector<void*>
to reduce code size. In C++, even the interface of a
templated type can be changed in a specialization, as happens for
std::vector<bool>
.
Generic specialization
Specialization of generics, or types used by generics, is restricted to changing the implementation without affecting the interface. This restriction is needed to preserve the ability to perform type checking of generic definitions that reference a type that can be specialized, without statically knowing which specialization will be used.
While there is nothing fundamentally incompatible about specialization with generics, even when implemented using witness tables, the result may be surprising because the selection of the specialized generic happens outside of the witness-table-based indirection between the generic code and the concrete implementation. Provided all selection relies exclusively on interfaces, this still satisfies the fundamental constraints of generics.
Conditional conformance
Conditional conformance is when you have a parameterized type that has one API
that it always supports, but satisfies additional interfaces under some
conditions on the type argument. For example: Array(T)
might implement
Comparable
if T
itself implements Comparable
, using lexicographical order.
Interface type parameters and associated types
Imagine an interface defining a container. Different containers will contain different types of values, and the container API will have to refer to that "element type" when defining the signature of methods like "insert" or "find". If that element type is a parameter (input) to the interface type, we say it is an interface type parameter; if it is an output, we say it is an associated type. An associated type is a kind of associated entity.
Interface type parameter example:
interface StackTP(ElementType:! Type)
fn Push[addr me: Self*](value: ElementType);
fn Pop[addr me: Self*]() -> ElementType;
}
Associated type example:
interface StackAT {
let ElementType:! Type;
fn Push[addr me: Self*](value: ElementType);
fn Pop[addr me: Self*]() -> ElementType;
}
Associated types are particularly called for when the implementation controls the type, not the caller. For example, the iterator type for a container is specific to the container and not something you would expect a user of the interface to specify.
interface Iterator { ... }
interface Container {
// This does not make sense as an parameter to the container interface,
// since this type is determined from the container type.
let IteratorType:! Iterator;
...
fn Insert[addr me: Self*](position: IteratorType, value: ElementType);
}
class ListIterator(ElementType:! Type) {
...
impl as Iterator;
}
class List(ElementType:! Type) {
// Iterator type is determined by the container type.
impl as Container where .IteratorType = ListIterator(ElementType) {
fn Insert[addr me: Self*](position: IteratorType, value: ElementType) {
...
}
}
}
If you have an interface with type parameters, a type can have multiple impls for different combinations of type parameters. As a result, type parameters may not be deduced in a function call. However, if the interface parameters are specified, a type can only have a single implementation of the given interface. This unique implementation choice determines the values of associated types.
For example, we might have an interface that says how to perform addition with another type:
interface Addable(T:! Type) {
let ResultType:! Type;
fn Add[me: Self](rhs: T) -> ResultType;
}
An i32
value might support addition with i32
, u16
, and f64
values.
impl i32 as Addable(i32) where .ResultType = i32 { ... }
impl i32 as Addable(u16) where .ResultType = i32 { ... }
impl i32 as Addable(f64) where .ResultType = f64 { ... }
To write a generic function requiring a parameter to be Addable
, there needs
to be some way to determine the type to add to:
// ✅ This is allowed, since the value of `T` is determined by the
// `y` parameter.
fn DoAdd[T:! Type, U:! Addable(T)](x: U, y: T) -> U.ResultType {
return x.Add(y);
}
// ❌ This is forbidden, can't uniquely determine `T`.
fn CompileError[T:! Type, U:! Addable(T)](x: U) -> T;
Once the interface parameter can be determined, that determines the values for
associated types, such as ResultType
in the example. As always, calls with
types for which no implementation exists will be rejected at the call site:
// ❌ This is forbidden, no implementation of `Addable(Orange)`
// for `Apple`.
DoAdd(apple, orange);
Type constraints
Type constraints restrict which types are legal for template or generic parameters or associated types. They help define semantics under which they should be called, and prevent incorrect calls.
In general there are a number of different type relationships we would like to express, for example:
- This function accepts two containers. The container types may be different, but the element types need to match.
- For this container interface we have associated types for iterators and elements. The iterator type's element type needs to match the container's element type.
- An interface may define an associated type that needs to be constrained to implement some interfaces.
- This type must be compatible with another type. You might use this to define alternate implementations of a single interfaces, such as sorting order, for a single type.
Note that type constraints can be a restriction on one type parameter or associated type, or can define a relationship between multiple types.
Type-of-type
A type-of-type is the type used when declaring some type parameter. It foremost determines which types are legal arguments for that type parameter, also known as type constraints. For template parameters, that is all a type-of-type does. For generic parameters, it also determines the API that is available in the body of the function.