How does MPS identify languages, solutions, models, concepts, nodes, or even properties? In this blog post, we take look at the details.

In the MPS talk series on youtube during the first week of February, Tom showed a project template for MPS using Yoeman. Yoeman is a templating tool for software projects. It creates a new project from a template. During the project creation Yeoman can execute certain actions like replacing macros in the template. Afterward the talk there was a discussion on Twitter what the limitations of this approach are:

In the MPS Talks yesterday Tom Beadman showed how to use Yeoman to generate the template. That looked quite nice. This could also be the place where you can make the IDs unique.
— Markus Voelter (@markusvoelter) February 3, 2021

Although I pointed out the limitations of the approach I don’t think the approach is wrong or anything bad. I like it because it’s simple. The limitations of such a templating process are largely due to how MPS identifies languages, models, and concepts internally. I don’t think the way MPS is handling identity internally is bad either. The MPS way provides lots of flexibility. It’s the combination of both approaches that can put one into a somewhat undesired situation where languages, models, or nodes end up with duplicate identities. There is almost no documentation available on how MPS identifies “things” internally. I have implemented custom model persistence and project templating for MPS in the past. This leads me to the idea to dump my knowledge here for others to get a better understanding of the topic.

One disclaimer, the architecture of MPS would at least theoretically support different implementations than what I‘m going to explain here. At the time of writing, there is a single implementation on which this post is based. This blog post assumes that you have intermediate MPS knowledge like you know what a „concept declaration“ is, you know what a „model“ is and you have worked with the SNode class & node<> types. Many of the things in this post you will not need to know to successfully work with MPS. But if you are interested in a technical deep dive the this is the right post for you. Let‘s get our propeller heads on and take a look at the details.

Modules, Models, and Nodes

Let’s first look at how models in MPS are stored. Below you can see the main building blocks of MPS persistence. In this post, we will focus on Modules, Models, and Nodes. This aspect of the MPS persistence is useful when you want to write a custom model persistence. It’s also useful if you plan to write a model root to get external data into MPS. But I won’t cover how to write your persistence, there are other resources for that like the MPS sample for XML persistence.

MPS loads modules, models, and nodes into a Repository. At the time of writing, MPS has a single global repository that is used for all projects. The repository contains all the content of the project(s) and all their dependencies. This behavior implies that you cannot have the same dependency in different versions within two projects if you open them at the same time. There are ongoing efforts in MPS to provide stronger isolation for projects which would allow a repository per project but it’s not there at the moment.

Module

The module is the top-level building block for everything in MPS. MPS includes three kinds of modules Solution, Language, and DevKit. For this part of the blog post, we will exclude the Language and DevKit kind and focus on the Solution kind. More on Languages in the „Languages, Concepts, Roles and Properties“ paragraph. The main purpose of the Solution is to provide configuration for model persistence aka Model Root and various other kinds of configurations like compiling generated java code. These configurations are called facets internally. Language engineers can contribute facets if required. While the Model Root plays an important role in model persistence it’s not directly involved in the identity of a model and we will ignore it in this post.

Each module is identified by a module id which needs to be globally unique. If MPS observes modules with the same id it will load the one which it finds first and skip the other module. The module id is stored in the module file, incase of a Solutions this is the .msd file and for languages, it’s the .mpl file. The module id is not to be confused with the model id. Models have separate ids and are identified independently of the module. It is possible to move a model to a different module without changing its identity. But moving a model might break code generation or compilation because the module is used for configuring aspects like class loading and compilation dependencies.

Model

A model is identified by a model id. A model id can be globally unique in which case the module, which contains a model, isn’t used for calculating the identity of a model. If a model id is not globally unique it is bound to the module. The default aka regular model ids created by MPS are globally unique and backed by a UUID. In scenarios like custom persistence where the model id isn’t chosen by MPS, MPS comes with a variety of model id types build in. The two most interesting id types are ForeignSModelId and IntegerSModelId. The name “foreign” is a bit misleading since this type of id is a simple string. The foreign id is often useful when you have a unique string identifying the model e.g. a URL. The integer id by contrast isn’t globally unique and useful when it’s ok for you to combine module and model identity.

Node

Nodes are also identified by ids but in contrast to model and module ids, they are never considered globally unique. A node id only has to be unique within a model. This is why the regular node id uses a Java long to store the id. Similar to the model id MPS comes with a foreign node id build it. The foreign node id is based on a string. To globally identify a node the node id isn’t enough. To construct the complete identity of a node you will need the model id and the node id. Conveniently MPS ships with the SNodePointer class which exactly does this. The node pointer class can be safely stored and parsed from a string, it is useful in scenarios where you need to exchange node identity. A node pointer is for instance what the Copy Node URL action uses to encode the node into the URL.

While it might seem odd that nodes need the model id as part of their identity it does allow for some optimizations in MPS. When nodes are copied between models they, if no conflict occurs, keep their original node ids. When more than one node is copied at a time MPS can easily manage to keep references between the nodes intact. While it doesn’t use the same class a reference between nodes is technically a SNodePointer. A reference stores the same information: model id and node id of the reference target. This behavior allows MPS to update the model part of a reference without changing the node id part. The result is that self-contained groups of nodes can be copied while preserving the references between each other.

Languages, Concepts, Roles, and Properties

Now that we know how the basics of persistence we can take a closer look at the meta-meta model of MPS. The meta-model that is used to create your languages. We will only focus on the structure aspect of a language definition and exclude everything else. Since MPS is bootstrapped, which means MPS is used to create MPS, every language defined in MPS is defined using modules, models, and nodes. It can be a little mind-bending at times but I hope this part isn’t too complicated. MPS does not reuse model or node id to identify parts of the language definition. MPS uses ids that are stored as properties in the nodes of a language definition decoupling the identity of a meta-model element from its persistence.

Languages

Languages are not primarily identified by their name but by an id. This allows for languages with the same name in a repository. While for a user this would be confusing in the UI MPS can handle this case technically with no issues.

The language id in MPS is the module id of the language module. It is the same id used as the module id.

Concepts

A Concept definition in MPS is a root node like any other node. While it has a node id the node id isn’t used as part of its identity. Each concept has a concept id. The concept id is a property on the concept definition. You can see, and change, the id in the inspector of a concept definition.

The concept id needs to be unique within a language but not globally. At runtime, the identity of a concept is implemented by SConceptId. Concepts are identified by the language, using the language id, and the concept id. It’s perfectly fine to have the same concept id in two different languages. This once again simplifies copy and paste but also refactoring use cases. Concepts can be moved between languages without changing their ids. In case of a refactoring where concepts are moved between languages, MPS can take advantage of this and only needs to update the language id part of the existing instance.

Roles and Properties

Roles and properties are identified by, you guessed it, ids. A property on a concept is used to store a value within an instance of that concept. Roles in a concept describe relationships to other concept instances. MPS supports two kinds of roles: references and containment. References are pointers to other nodes and don’t imply a lifecycle relationship between the two nodes. Containment describes a parent-child relationship where the children share the lifecycle of the parent. Role and property definitions have an id property similar to concepts. Role and property ids are required to be unique within the same concept. It is possible to have two properties or roles with the same id in different concepts within the same language. To uniquely identify a role or property the concept needs to be identified with its id and then the property id can be used. Since concepts are identified by the language id and the concept id the role/property id technically contains the language id, the concept id, and the role/property id.

What happens when an id isn’t unique?

Lots of weird things can happen if an id isn’t considered unique in the repository. What happens depends on the type of id that is duplicated. For modules and models, the effect is usually straightforward: if MPS encounters a model or module with the same id the second time it will not load it. While this looks easy to diagnose there are situations where this isn’t that easy. Depending on how MPS loads the content of the repository it can change which one of the duplicates is encountered first. From my experience, the logic for loading the repository content is stable. This means when opening the project twice the same module or model is ignored. The problem is that the logic differs in the command line build where it’s dictated by the generated ANT file. The generated ANT file can differ from the logic that the IDE employs and therefore create a different behavior in the command line build than in the IDE. Luckily recent MPS versions warn you heavily when a module or model id is duplicated. Your build logs will be full of warnings which makes diagnosing this problem easy.

When concept, role, or property ids are duplicated then things get much harder to diagnose. Ideally, you open MPS and you see a model checking error in your language definition. Which tells you that there are duplicate ids and you even get an intention to fix these ids. If for some reason a language gets shipped with duplicated ids the effects at runtime are hard to diagnose. Code that is interacting with concept instance might not “see” property values or roles. This can be particularly hard to find in generators. We spend quite some time diagnosing a problem during a recent migration. That particular problem wasn’t caused by duplicates ids in the language definition but generated code wasn’t calculating the id correctly. While harder to diagnose these type of error very rare and requires active “abuse” by the language engineer.

If you liked the content consider subscribing to the email newsletter below. The newsletter delivers all posts directly into your inbox. For feedback on the topic feel free to reach out to me. You can find me on Twitter @dumdidum or write a mail to kolja@hey.com.

How JetBrains Meta Programming System (MPS) Identifies Things.