How many times have you worked with inheritance hierarchy, which was almost impossible to change? Which made you feel like you are not the one in control, but just plugging bits and pieces into this big colossus? If you make living out of programming for some while, chances are you have experienced this more than once. So how do we prevent this?
We will use typical e-shop domain in examples, which you are most probably familiar with.
Where lies the problem?
Let’s model a product. Product has a name and can be priced in two ways. Either in some currency (euros in our case) or relatively to some other product (e.g. bigger cola costs always 50% more than smaller cola). First attempt to model this with inheritance would probably look like this:
So far so good with inheritance, but now new requirement comes in. Product can be either shipped within a specific city or picked up on specific address. Continuing with original solution, we would end up with 4 final classes:
You can see that it is not ideal. Especially if there would be another requirement in similar manner, resulting in total of 8 classes, then another to 16 and so on. Or what if we want to add another delivery method (extend current requirement)?
IS vs HAS
In order to solve this problem, we need to rethink how we model our data. So far, our model states that for example Cola is shippable, fix-priced product. Instead of that, we can state that Cola has a fixed price and has shipping. Let’s do that.
Fairly simple. No inheritance, we composed the product out of 2 things. How does
Delivery look like?
Now if we want to add another type of price, we just add another class deriving
Price, instead of creating 2 times the classes we have, but with keeping the data model clean and flexible, which makes impossible states irrepresentable and nicely models our domain.
But we used inheritance to do the composition?!
Yes, but we used it as a mechanic for distinguishing several data structures, not to model our whole product. Also, we kept the nesting to only single level and the base classes have nothing in them. Inheritance gets messy with multiple levels, virtual methods and sharing data. How to avoid this misery?
Before that, let’s also look how code working with the
Price would look like. Let’ say we want to calculate final price in dollars. Ideal solution within OOP would be to introduce abstract function on base class and use polymorphism.
But since we need to call http client in that function to get the exchange rate, this solution is very poor. We don’t want to hardcode http client call in the entity. Also, there can be hundreds of functions that behave differently for each price type and we don't want to pollute classes with all of these methods. Moreover, we might be working with this class from another assembly, which means we cannot extend the class in this way. We would rather calculate the price in some service.
But since we could not use polymorphism, what if we add another price type? We can easily forget to extend this code to work with that.
Algebraic data types
If you look closely, we modeled everything with simple “AND” and “OR” relationships. For example
Product is made of
Price is made of
RelativePrice is made of
Adjustment. You see the pattern? Welcome to the world of algebraic data types.
There are two of them corresponding to the “AND” and “OR” relationships. They serve as types to compose any other type. The first one is called product, mostly known as tuple. It connects any other types together. So, for example,
RelativePrice is product of
Product is a product of
The one representing “OR” is called coproduct, also known as sum type or discriminated union. This one is the lesser known among programmers, but the real powerhouse. It takes some other types and creates one type, which can be only one of them.
Price from the example is either
FixedPrice. It cannot be both but must be at least one. C# does not support sum types natively (yet), but for example F# has them since day one. To use them in C#, you can simulate them with inheritance as we did in the example and be careful, to keep the hierarchy in good shape, or use some library. Using our library FuncSharp,
Price would look like this (checkout the docs and code for more detailed example).
The real power of algebraic data types is they force the composition, forcing us to behave.
Now let’s change our
PricingService to use our new
Each coproduct has a
Match function on it, which takes a function for each case. It simulates concept called pattern matching. The important thing is, that the number of function is equal to the number of types. Meaning that if we now add a new
Price type and extend it from Coproduct2 to Coproduct3, this code will not compile. Compiler will force us to check all the places and make them work also with the newly added
As to everything else in programming, there are some downsides.
What if you want to write function that takes
Product with relative price as an argument? That’s not possible, but we lost that changing our relationship from is to has. Sometimes it can make sense to pick one of these and say that
FixedPricedProduct and share the code within them with some common data structure (or even via inheritance). Be cautious though and use it only when necessary.
Another problem with discriminated unions is their exhaustiveness. If you extend the union by one more type, your whole codebase will not compile, and compiler will force you to solve this new case on every usage. That’s a great feature and one of the main reasons to use them, it can be problematic when working across assemblies and exposing them in public API of your libraries. There is no way to migrate to the new version incrementally, since you must solve all the cases in order to compile your solution with the extended coproduct. This problem is known as Expression problem and discusses ways how to add new type or functionality in non-breaking way.
What next ?
Some resources you might find useful:
For more engineering insights shared by Mews tech team: