domingo, 18 de outubro de 2015

Explaining VBA Arrays (and its issues)

First of all, we'd like to make clear that this is not a blog to learn Excel and VBA basics. Our purpose is to give more tools to make Excel and VBA simple and easy. 
We are saying this because now we will talk about a basic concept of the VBA language, the Array, but the reason for that is to explain the need of a whole module, the ARRAY_MODULE, to deal with arrays instead of simply using them without any treatment.

Array definition: a VBA data type which contains a group of variables of the same data type, which can be addressed by indexes.

Here is an example of an array declaration:

Dim myArray(2 to 12) as string

This line creates a uni-dimensional array with 10 strings (index goes from 2 to 12). So the following lines of code would work:

myArray(2) = "john"
myArray(12) = "peter"
myArray(6) = "0"

On the other hand, the following lines  of code would NOT work:

myArray(1) =  "mary"
myArray(13) = "yan"

The reason why this would not work is because the index is smaller than the lower bound (in the first line) and greater than the upper bound (in the second line).

This happened because we defined the upper bound and the lower bound when we declared the array. The "(2 to 12)" meant that. Another way to declare this array would be like this:

Dim myArray(11) as string

This would give us an array also with 12 strings, but with the index going from 0 to 11 (0 is the default lower bound - this means VBA uses it when it is not explicitly declared by the programmer).
Once you have an array, you can check its boundaries using the Ubound and the Lbound functions, which return the upper bound and the lower bound of an arrray dimension respectively. The code below would print 0 and 11 to the immediate window:

debug.print Lbound(myArray, 1)
debug.print Ubound(myArray, 1)

Notice the "1" parameter in the ubound and the lbound functions. This is the dimension of the array we are asking for the boundaries. Since we are using a 1-dimension array, we can only ask for the boundaries of dimension number 1. This means that the following line of code would give us an error:

debug.print lbound(myArray, 2)

This causes an error because we are trying to access the second dimension of a 1 dimension array. To declare a  2 dimensions array, we would have to do something like this:
Dim myArray2(0 to 99, 0 to 9) as string

This would give as an array with 100 rows and 10 columns. A 3 dimensions array could be defined as such:
Dim myArray3 (0 to 9, 0 to 9, 0 to 9) as String

Or, equivalently, 
Dim myArray3(9,9,9) as String

This would give us a 10x10x10 = 1000 elements array. (0 to 9 accounts for 10 elements, 10 for each dimension of the 3 gives us 10x10x10 = 1000 elements)
(Please notice that we are using string arrays, but any other data type could be used.)
The following code could be used:

myArray3(5,5,5) = "test"
myArray3(0,0,0) = "something"

So what is the big deal with that?

Let's suppose we used a function that returned an array and we attributed its return to an array named, say, "misteryArray' , like in the line below:

mysteryArray = someFunctionThatReturnsAnArray()

and we don't know how many dimensions it has (the misteryArray would had to be declared as a variant - there are many blogs that explain what a variant data type is, so I don't think it is necessary to explain it here). If we tried, say

msgbox ubound(misteryArray, 2)

and the array is a 1 dimension array, we would get an error message (an exception).
A person who has programmed in VBA knows this is a recurrent and annoying issue because it is very common to get arrays as outputs of functions so we don't really know how many dimension they have. So when we try to get the boundaries of the array (and, consequently, its size), we can ask for the boundary of a dimension that does not exist and get an error - and this is terrible.

Another major issue is the lower boundary. By default, the lower boundary of an array is zero, but this is not always the case. If we, for example, get the values of a range and put them on an array using the "value" method, like in the code line below, the lower boundary is, by default, 1.

misteryArray= Range("A1:C10").value
debug.print lbound(misteryArray, 1)

The code above would print "1" to the immediate window. This means that our array would have 10 rows, indexed from 1 to 10 and 3 columns indexed from 1 to 3.

The problem with that is that we are never really sure whether a N elements array dimension is indexed from 1 to N or from 0 to N-1 or from k to N+k-1.

Those issues are some of the things that makes VBA programming more difficult than most of the other languages.

To address those issues, we have created the ARRAY_MODULE, which will standardize all the arrays as 2 dimension arrays with each dimension being indexed from 0 to N-1. So every array will become a N x M matrix that can have its boundaries asked with no risk of getting an error. 

We have chosen 2 dimensions as default because the Excel cell ranges are organized as 2 dimensions matrices - so this will give us a straightforward mapping from ranges to arrays and from arrays to ranges. 
The reason why we chose zero as the lower boundary (instead of 1) is because most of the other programming languages work this way (this will create some issues when dealing with ranges though - ranges are indexed from 1 to N).

We will introduce the ARRAY_MODULE in another post. It is a very interesting module (and with a lot of room for improvement), with many useful functions that clear the road for other powerful modules.

Nenhum comentário:

Postar um comentário