domingo, 18 de outubro de 2015

Why you should use arrays a lot

Although we will talk a lot about VBA and programming, this blog, as the name indicates, is about Excel - of course we intend to publish articles about VBA in other platforms such as Outlook (we have a great code for Outlook that you MUST see!) - and that what this post is about (we will have posts talking exclusively about Excel - with no VBA involved - but this is not the case yet).

Let's suppose that, in the worksheet named "Sheet1", you have the ranges A1 to E100000 filled with random numbers (this means you would have 500,000 cells filled with numbers).

Now lets suppose you would like to sum all the values and print the result in a message box. The code below would accomplish that.

option explicit

public sub MAIN_Sum_And_Message()
   
    dim myRange as Object
    dim i, j, m, n as Long
    dim value as Double

    value =0
    set myRange = thisworkbook.Worksheets("Sheet1").Range("A1:E100000")

    m = myRange.Rows.Count
    n = myRange.Columns.Count

    for i = 1 to m
        for j = 1 to n
            value = value + cDbl(myRange(i,j).value2) 
        next j
    next i
    
    msgbox cStr(value)

    exit sub
end sub

This code, although it uses very good best practices like using the range full address and converting the values to double before adding them, would be very slow. The reason for that is because we are accessing the cells, one by one, to get their values - this means that for each cell value, we are going after the range object and reading its value.

Now let's try another approach.

option explicit

public sub MAIN_Sum_And_Message2()
   
    dim myArray as Variant
    dim i, j, m, n as Long
    dim value as Double

    value = 0
    myArray = Thisworkbook.Worksheets("Sheet1").Range("A1:E100000").Value

    m = ubound(myArray, 1)
    n = ubound(myArray, 2)

    for i = 1 to m
        for j = 1 to n
            value = value + cDbl(myArray(i,j)) 
        next j
    next i
    
    msgbox cStr(value)

    exit sub
end sub 

The code above would be ridiculously faster than the previous one. The reason for that is that the ".Value" property of the range object is the most efficient way to read data from a range and, once the data is read and put in an array, the data is loaded in the computer memory and it becomes available in the fastest way possible. 

Now this may all sound too extreme. Why would we bother with a few seconds more of execution? Well, this is one of the things that makes all the difference when we have operations in large sets of data. It may be the difference between a program be feasible to be executed instead of taking as long as hours to be executed.

So the bottom line here is to load the range values to arrays everytime you have more than a few lines to work on. This will spare you execution time and even make your code more elegant. Another reason for that is because when we introduce the ARRAY_MODULE (which simplifies the use of arrays), you will see that most of the functions receive arrays as parameters, so you better get used with arrays - we will use them all the time.

One more thing: if you think this is the only thing you can do to make your code run faster, your are wrong. We'll talk about performance best practices in another post and introduce the PERFORMANCE_MODULE module, that abstracts all the performance improvement best practices in a single function.

Nenhum comentário:

Postar um comentário