Let’s talk about variables scope and extent

August 18, 2021

variable scope and extent

Let’s make a quick recap of what we already know about variables. In the previous posts, we discussed the idea of a variable and its types, which allow for the static and dynamic validation of various data operations. Later, we examined the scope and heap memory allocation, which explains how the data are stored. Therefore, we have an understanding of the abstraction, data types, and memory allocation. Still, we need one last piece.

Today, I would like to introduce the closing chapter in our discussion regarding variables. We need to understand the space of visibility, where we can access our variable, and the period of existence, which is the timespan in which we can expect our variable to reference valid data.

Static scope

We start with the visibility of the variable name, which is described as a variable’s scope. Whenever we write the symbolic name, which acts as an identifier of the actual value, the program has to resolve the address and gives us the stored data. The task may be easy when we consider a single function with a local variable defined inside, but how about accessing a global variable, or even better, accessing a local variable of another function. Every programming language faces this problem; luckily, we have two well-known solutions.

The static scope, which is sometimes called the lexical scope, is the idea that whenever the program is looking for the name used in our code, it goes through the scopes that are the closest in the written code (text-metric). Thus, when our identifier is not present in the current function being executed at the moment, the program goes up to the parent function, which defined our current function. When this also proves to be futile, it goes up to the grandparent function, which defined the parent one. The limit is the global scope and global variables visible everywhere in the program.

Looking from the variable’s perspective, we can fully grasp the idea behind the scope. The static resolution limits the local name visibility to a particular block of code. It is not recognized outside of that space. If any function or block of code is defined inside, the scope is expanded upon them too. Still, when the same symbolic name is used to declare a variable inside the inner block, it shadows the outer one and gains precedence over it. That makes the local variable the first choice under this name for the inner block, and further blocs defined deeper inside.

The static scope is widespread across the programming world. Languages like C/C++, Java, Python, JavaScript, and many more are built around this concept. It seems pretty enjoyable to be able to spot the accessible variables by simply looking at the implementation. It has also proven to be the best approach to utilize the static code validation.
However, there is one significant disadvantage to be aware of. It is the potential overuse of global variables, or more generally, any variables defined outside the local block. Inflating a global state connected to various local functions may result in a horrible spaghetti code and hard-to-track updates.

Dynamic scope

Another approach to variables’ visibility is the dynamic scope. In contrast to the lexical one, the dynamic resolution looks for a variable name based on the execution context. This means that when implementing a specific function, you can certainly use all its local variables, but also you are entitled to use any variables defined in functions that are currently being executed, in other words, all blocks of code present in the call stack.
When you come from a statically-scoped language, that may sound like a crazy idea because the set of variables you can access in your present block may change based on which function has called your current one. Still, this is precisely the advantage of the dynamic scope. The very idea that we can change a given function’s behavior based on its invoking history explains the dynamism in this scope’s definition.

The execution chain defines the precedence of names because the program looks for identifiers originating from the most recent functions and starting from the current scope. The resolution potentially ends in the global scope, the main program that starts the whole execution.
The precedence naturally allows a local variable to shadow the previous one by using the same symbolic name. Nevertheless, in this approach, we only consider the closeness in the execution timeline with a complete disregard for the lexical proximity.

The dynamic scope is less common in programming languages. Some of them (Perl) allow choosing between the static and dynamic scope; others, like Emacs Lisp, use it by default. The most famous example that every programmer should recognize is Bash, a command language present in most Linux distributions.
As stated before, the main reason to use the dynamic scope is to make the executed logic more flexible and thus more dynamic. Nevertheless, this approach abandons specific abilities of static-type checking and introduces problems with unintentional data change. That is because the current execution may interfere with almost any state of the previously invoked functions that are still executing and waiting for the current one to finish.

It seems like an excellent place to show a little example explaining the difference between the static and dynamic scope.

int x = 1
function one() {
    int x = 2
    two()
}
function two() {
    print(x)
}
one()

When we treat this pseudocode in a statically-scoped manner, we see the global x defined and function one called. Function one initializes the local variable and calls function two, which prints variable x with value 1. That is because function two looks for x in the static scope above itself, and that happens to be the global scope.
When we treat this dynamically instead, we also see the global x and function one call. Then, the execution chain remembers function one with its local variable x. Later, function one calls function two. When the print happens, the program looks back into the execution chain. It resolves x as a variable defined in function one because it is closer in terms of the execution stack timeline, so the printed value is 2.

Extent

The last thing to mention about variables is the extent. While the scope describes the visibility of an identifier, the extent describes the time when an identifier binding is valid and points to the proper memory location. In other words, the extent is the period when a variable is alive because when there is no memory for keeping the value, a variable is basically dead.
Most of the time, the scope and extent correspond very closely to each other because when we enter some block of code and initialize a local variable, we instantly receive the proper location for its data. Then, when the block ends, the variable falls out of scope, and the memory is freed — a coherent and straightforward behavior.

However, there are times when we are out of the variable’s scope but still in its extent. That means the memory is allocated, but the variable is not accessible. When the situation is temporary, like when we call a different function and the context changes, it is ok because, in the end, we can reach the value again. But sometimes, we lose the identifier forever, and if the memory remains allocated, we experience a memory leak.
On the other side, we may access a symbolic name that has no assigned memory. Then, we are talking about wild or dangling pointers. It may happen when a variable is not yet initialized, or the location is already freed. Either way, accessing that kind of variable with an unidentifiable location address may be hazardous and possibly corrupt random data.

Conclusion

Variables have different properties regarding their visibility and lifetime. Concerning the symbolic name, the scope describes the accessibility within the program. In the static approach, we resolve the identifier using the lexical context, and in the dynamic approach, we resolve it using the execution history.
The extent of the variable corresponds to the binding between an identifier and a memory location. It describes the time when a proper address references the place to store data. That is a vital part of the variable which makes it alive.
Both the scope and the extent tend to be very closely related.

Summarizing all the information about variables, we described the concept of linking a symbolic name and its memory location. Then, we compared static- and dynamic-type systems. Later, we discussed various memory allocations within the stack and heap space. Finally, we talked about a symbolic name scope and a memory binding extent. All those elements are vital details regarding the variable abstraction, which is crucial for most programming languages. Knowing them helps with a better understanding of the languages themselves.