Last time we discussed the idea behind variables in programming languages. We described variables as an abstraction joining the identifier and the value, with an additional layer of an addressable location in between.
We also discussed static- and dynamic-type systems that logically bind the chosen symbolic name with the actual value and allow for type validation during compilation and runtime, respectively.
One of the most important details about a variable is the location where it stores its value. That’s why today, I would like to expand the topic and consider different categories of variables regarding the time and place of their memory allocation.
We start from the top, and the most distinguished variable of all is the static one. We are not referring to static fields of classes in object-oriented programming, they may be treated very differently depending on the language, but instead, we refer to global variables. Those are meant to be accessible across the whole program, from the beginning to the end of its execution. Another fine example of static memory allocation may be any variable declared as
static, which allows for sharing the state between consecutive function iterations.
A trait of static variables is the memory allocation during the compilation phase, and the address is assigned before the program starts to run. The convenience of a constant static location is the main reason why they are easily accessible from any place in the program.
On the flip side, the static variables may be dangerous simply because of an easy access to them and thus a potential involuntary change to the shared state. In addition, the static address for a given symbolic name makes it less flexible, for instance, unsuitable for recursive use.
The most popular variable type is the local variable that is visible only inside a specified block of code like a function or a procedure. The memory for that kind of variable is dynamically allocated on the stack during runtime when the execution meets its declaration. At the end of the block, the memory is automatically freed, and the variable disappears from the stack.
Stack variables are widespread in various languages that allow for separated blocks of code. For instance, they may be a function parameter or simplify defined inside the body of the function.
The precious thing about local variables is that they are straightforward because allocation and deallocation happen automatically, respectively, at the bloc’s beginning and end. The programmer does not have to do anything with it. They are also flexible; thus, we can safely use them across different functions and in recursive code.
The obvious downside is the necessary initialization which takes time and happens during every execution. Stack variables are also not suitable for large data structures, but the solution to this is in the next paragraph.
The final type of location for a variable is the heap memory. A heap is a special place designated to be used by a program and separated from the stack. As a result, programming languages usually keep primitive values like a number, character, and string (depending on the language) on the stack because they have constant, predictable size and are considered lightweight. On the other hand, the complex data values like an object, vector, and list are kept in the heap space due to their dynamic nature and potential massive data size.
When we pass a variable to a function or reassign it to another variable, the primitive is regarded as a value. So, we simply duplicate the content into the other variable. Conversely, the complex variable that keeps its data in the heap space is regarded as a reference to the addressed location, so we reassign only the address itself. In this case, we do not duplicate the content because that could be a time-consuming operation.
That performs an essential role in many languages because the programmer has to know when two different symbolic names share only the same value and may modify their respective stack locations independently and understand the situation where two identifiers share the reference and thus point to and modify the exact same data in the heap location.
Furthermore, we may distinguish the explicit and implicit allocation of the heap space. The explicit one is when we call the allocation function or trigger the constructor manually. In return, we get the address location pointing to the heap. We can observe this in languages like C/C++ and Java when using
When the memory is no longer needed, we should deallocate the heap location. In this task, we also distinguish between the explicit and implicit approaches. The explicit one is a reverse action of the allocation, so the programmer manually frees the space using proper syntax. Conversely, the implicit way does not require programmer’s attention at all. It is usually implemented as a mechanism called the garbage collector. This approach of handling the variable’s space requires a particular strategy to learn whether the considered location is actually not used anymore and may be safely freed.
The apparent trade-off between the implicit and explicit heap memory management is choosing between a flexible code with safe memory control and highly efficient direct memory administration.
The variable is a useful abstraction to operate on the state. To make it work, it needs a special memory to keep the data at hand.
We distinguish different spaces for variable memory based on the assignment time, visibility scope, and data size. All those aspects are essential to understand different types of memory and make the most out of it in any language we choose.
Next time, we will discuss the final factor regarding variables in programming languages.