DanIDL

General

• Home
• News
• IDL Speed Tips
• Web Statistics

Science

• Publications
• Variable Stars In Globular Clusters

DanIDL

• Features
• Installation Instructions
• Release History
• v3.0

DanIDL-Lite

• Features
• Installation Instructions
• Release History
• v3.0

DanDIA

• v1.0 - Coming Sometime

Blog

• I dislike ArXiv...
• I dislike Mobile Phone Companies...

Contact

Daniel Bramich
dan.bramich "AT" hotmail.co.uk

IDL Speed Tips

The following table shows the results of speed comparisons between different ways of implementing the same operations. It should be used as a guide when attempting to optimise IDL code. Results for scalar quantities are shown in YELLOW and results for array quantities are shown in RED. The number quoted in the third column is that which was used for the values of the scalar quantities or array elements in the test. Note that the tests were performed on a computer with 4 CPUs, which means that results may differ for other computers when IDL operations that employ the IDL thread pool are involved.

Further general optimisation hints are listed after the table.

Variable Type	No. Elements	Value	Slower Expression(s)	Faster Expression	Factor

INTEGER SCALAR	1	1	x = a LE 1	x = a LT 2	1.020
INTEGER VECTOR	1000	1	x = a LE 1	x = a LT 2	1.015
INTEGER VECTOR	1000000	1	x = a LE 1	x = a LT 2	1.012

FLOAT SCALAR	1	1.0	x = a^2.0	x = a^2	1.40
FLOAT SCALAR	1	1.0	x = a^3.0	x = a^3	1.40
FLOAT SCALAR	1	1.0	x = a^4.0	x = a^4	1.43
FLOAT SCALAR	1	1.0	x = a^5.0	x = a^5	1.40
FLOAT VECTOR	1000	1.0	x = a^2.0	x = a^2	7.34
FLOAT VECTOR	1000	1.0	x = a^3.0	x = a^3	8.43
FLOAT VECTOR	1000	1.0	x = a^4.0	x = a^4	6.17
FLOAT VECTOR	1000	1.0	x = a^5.0	x = a^5	6.84

FLOAT SCALAR	1	1.0	x = a^2	x = a*a	1.27
FLOAT SCALAR	1	1.0	x = aaa	x = a^3	1.15
FLOAT SCALAR	1	1.0	x = aaa*a	x = a^4	1.75
FLOAT SCALAR	1	1.0	x = aaaaa	x = a^5	2.16
FLOAT VECTOR	1000	1.0	x = a^2	x = a*a	4.73
FLOAT VECTOR	1000	1.0	x = a^3	x = aaa	2.23
FLOAT VECTOR	1000	1.0	x = a^4	x = aaa*a	2.24
FLOAT VECTOR	1000	1.0	x = a^5	x = aaaaa	1.57
FLOAT VECTOR	1000000	1.0	x = a^2	x = a*a	1.85
FLOAT VECTOR	1000000	1.0	x = a^3	x = aaa	1.34
FLOAT VECTOR	1000000	1.0	x = a^4	x = aaa*a	1.51
FLOAT VECTOR	1000000	1.0	x = a^5	x = aaaaa	1.21

FLOAT SCALAR	1	1.0	x = a^0.5	x = sqrt(a)	1.94
FLOAT VECTOR	1000	1.0	x = a^0.5	x = sqrt(a)	35.33
FLOAT VECTOR	1000000	1.0	x = a^0.5	x = sqrt(a)	16.76

FLOAT SCALAR	1	1.0	x = a^(-1)	x = 1.0/a	1.87
FLOAT VECTOR	1000	1.0	x = a^(-1)	x = 1.0/a	1.80
FLOAT VECTOR	1000000	1.0	x = a^(-1)	x = 1.0/a	1.30

FLOAT SCALAR	1	1.0	x = a^(-2) x = 1.0/(a*a)	x = 1.0/(a^2)	1.11 1.00
FLOAT SCALAR	1	1.0	x = a^(-3) x = 1.0/(aaa)	x = 1.0/(a^3)	1.12 1.25
FLOAT SCALAR	1	1.0	x = a^(-4) x = 1.0/(aaa*a)	x = 1.0/(a^4)	1.13 1.73
FLOAT VECTOR	1000	1.0	x = a^(-2) x = 1.0/(a^2)	x = 1.0/(a*a)	1.48 2.17
FLOAT VECTOR	1000	1.0	x = a^(-3) x = 1.0/(a^3)	x = 1.0/(aaa)	1.09 1.55
FLOAT VECTOR	1000	1.0	x = a^(-4) x = 1.0/(a^4)	x = 1.0/(aaa*a)	1.32 1.71
FLOAT VECTOR	1000000	1.0	x = a^(-2) x = 1.0/(a^2)	x = 1.0/(a*a)	1.17 1.32
FLOAT VECTOR	1000000	1.0	x = a^(-3) x = 1.0/(a^3)	x = 1.0/(aaa)	1.00 1.31
FLOAT VECTOR	1000000	1.0	x = a^(-4) x = 1.0/(a^4)	x = 1.0/(aaa*a)	1.13 1.32

FLOAT SCALAR	1	1.0	x = a/2.0	x = 0.5*a	1.00
FLOAT VECTOR	1000	1.0	x = a/2.0	x = 0.5*a	2.05
FLOAT VECTOR	1000000	1.0	x = a/2.0	x = 0.5*a	1.33
FLOAT SCALAR	1	1.0	x = -1.0*a	x = -a	1.53
FLOAT VECTOR	1000	1.0	x = -1.0*a	x = -a	1.93
FLOAT VECTOR	1000000	1.0	x = -1.0*a	x = -a	1.47

FLOAT VECTOR	1000	1.0	a = a + 1.0	a = temporary(a) + 1.0	1.06
FLOAT VECTOR	1000000	1.0	a = a + 1.0	a = temporary(a) + 1.0	4.01
FLOAT VECTOR	1000	1.0	x = a	x = temporary(a)	2.00
FLOAT VECTOR	1000000	1.0	x = a	x = temporary(a)	1200.0
FLOAT VECTOR	1000	1.0	x = a + 1.0	x = temporary(a) + 1.0	1.00
FLOAT VECTOR	1000000	1.0	x = a + 1.0	x = temporary(a) + 1.0	2.89

FLOAT SCALAR	1	1.0	x = min([a1,a2])	x = a1 < a2	5.23
FLOAT SCALAR	1	1.0	x = max([a1,a2])	x = a1 > a2	6.65

DOUBLE VECTOR	1000	1.0	x = total(vec1*vec2, /DOUBLE)	x = calculate_dot_product_vector(vec1, vec2, status, /NO_PAR_CHECK)	2.14
DOUBLE VECTOR	1000000	1.0	x = total(vec1*vec2, /DOUBLE)	x = calculate_dot_product_vector(vec1, vec2, status, /NO_PAR_CHECK)	2.05

FLOAT VECTOR	1000	0.0	x = fltarr(1000L)	x = fltarr(1000L, /NOZERO)	1.06
FLOAT VECTOR	1000000	0.0	x = fltarr(1000000L)	x = fltarr(1000000L, /NOZERO)	2.12
FLOAT VECTOR	1000	1.0	x = fltarr(1000L, /NOZERO) & x[*] = 1.0	x = replicate(1.0, 1000L)	2.22
FLOAT VECTOR	1000000	1.0	x = fltarr(1000000L, /NOZERO) & x[*] = 1.0	x = replicate(1.0, 1000000L)	3.57
FLOAT VECTOR	1000	1.0	x[*] = 0.0	replicate_inplace, x, 0.0	2.39
FLOAT VECTOR	1000000	1.0	x[*] = 0.0	replicate_inplace, x, 0.0	10.10

FLOAT VECTOR	1000	0.0	a[*] = 1.0	a[0] = replicate(1.0, 1000L)	1.88
FLOAT VECTOR	1000000	0.0	a[*] = 1.0	a[0] = replicate(1.0, 1000000L)	2.40
FLOAT ARRAY	(1000,1000)	0.0	a[,] = 1.0	a[0,0] = replicate(1.0, 1000L, 1000L)	1.45
FLOAT ARRAY	(1000,1000)	0.0	a[0,0] = replicate(1.0, 1L, 1000L)	a[0,*] = 1.0	3.70
FLOAT ARRAY	(1000,1000)	0.0	a[*,0] = 1.0	a[0,0] = replicate(1.0, 1000L)	2.28
FLOAT ARRAY	(1000,1000)	0.0	a[0,0] = reform(findgen(1000L), 1L, 1000L)	a[0,*] = findgen(1000L)	2.91
FLOAT ARRAY	(1000,1000)	0.0	a[*,0] = findgen(1000L)	a[0,0] = findgen(1000L)	2.46
FLOAT ARRAY	(10,1000,1000)	0.0	a[0,0,0] = reform(findgen(1000L,1000L), 1L, 1000L, 1000L)	a[0,,] = findgen(1000L,1000L)	3.35

FLOAT VECTOR	1000	1.0	subs = where(a EQ 1.0, n)	n = long(total(a, /DOUBLE))	1.71
FLOAT VECTOR	1000000	1.0	subs = where(a EQ 1.0, n)	n = long(total(a, /DOUBLE))	2.87
FLOAT VECTOR	1000	1.0	n = long(total(1.0 - a, /DOUBLE))	subs = where(a NE 1.0, n)	1.32
FLOAT VECTOR	1000000	1.0	n = long(total(1.0 - a, /DOUBLE))	subs = where(a NE 1.0, n)	1.89
FLOAT VECTOR	1000	1.0	n = long(total(1.0 - a, /DOUBLE))	subs = where(a EQ 0.0, n)	1.32
FLOAT VECTOR	1000000	1.0	n = long(total(1.0 - a, /DOUBLE))	subs = where(a EQ 0.0, n)	1.89

FLOAT SCALAR	1	1.0	x = exp(complex(0.0,a))	x = complex(cos(a),sin(a))	1.03
FLOAT VECTOR	1000	1.0	x = exp(complex(0.0,a))	x = complex(cos(a),sin(a))	3.39
FLOAT VECTOR	1000000	1.0	x = exp(complex(0.0,a))	x = complex(cos(a),sin(a))	3.34
FLOAT SCALAR	1	1.0	x = aexp(complex(0.0,a)) x = complex(acos(a),a*sin(a))	x = a*complex(cos(a),sin(a))	1.01 1.16
FLOAT VECTOR	1000	1.0	x = aexp(complex(0.0,a)) x = acomplex(cos(a),sin(a))	x = complex(acos(a),asin(a))	3.28 1.03
FLOAT VECTOR	1000000	1.0	x = aexp(complex(0.0,a)) x = acomplex(cos(a),sin(a))	x = complex(acos(a),asin(a))	3.21 1.06

The following general optimisation hints may also be useful:

Needless to say, one should use array operations instead of "for" loops where possible, since IDL is optimised for such calculations.

Avoid unnecessary variable type conversions by using variables of the same type where possible. This way IDL does not have to do the conversion itself.

Ensure that operations on scalars are performed BEFORE the result is applied to an array, otherwise IDL will perform more array operations than is strictly necessary. For example:
SLOWER (2000000 additions): a = fltarr(1000,1000) & b = 5.0 & c = 3.1 & x = (a + b) + c

FASTER (1000001 additions): a = fltarr(1000,1000) & b = 5.0 & c = 3.1 & x = a + (b + c)
If an array is being operated on and the result is overwriting the original array, then use the "temporary" function in the expression on the right-hand side, which avoids a copy of the original array being made before the operation is performed, resulting in a faster execution and less memory being used. Also use the "temporary" function on the right-hand side of an expression for a variable that will not be used again throughout the rest of the code, which again results in a faster execution and immediately frees up memory for the rest of the program. See the above table for some examples. Note that the "temporary" function should not be used on scalar quantities.

When initialising an array with IDL functions like "intarr", "fltarr", etc., use the "/NOZERO" keyword if possible to leave the array elements undefined, otherwise the array will be initialised with all elements set to zero, which may be unnecessary. See the above table for an example.

When inserting a sub-array into a larger array of the same dimensions, it is only necessary to specify the lowest subscript in each dimension for where the sub-array is to be inserted, rather than specifying the full subscript range in each dimension. This saves IDL from having to generate the subscripts for the relevant portion of the larger array before carrying out the insertion operation. For example:
SLOWER: imdata[10:20,0:300] = imcutout

FASTER: imdata[10,0] = imcutout
When extracting a sub-array from a larger array, use the IDL "reform" function to give the sub-array the required dimensions. This avoids having to previously set up an array of the required dimensions to hold the sub-array, and then insert the sub-array into the new array. For example:
SLOWER: vec = dblarr(n, /NOZERO) & vec[*] = arr[0,*]

FASTER: vec = reform(arr[0,*], n)
When comparing a scalar with an array (e.g. using the greater than comparison, etc.), one should be aware that if the scalar and array are of different number types, then IDL will convert the scalar and array to the more precise of the two number types. If it is the scalar that is of the more precise number type than the array, then the type conversion takes place on the array, which can introduce a large processing overhead. This processing overhead can be avoided by converting (or writing) the scalar to the number type of the array. For example:
For an array "arr" of type BYTE and a scalar "val" of type INTEGER:
SLOWER: subs = where(arr EQ val, nsubs)
FASTER: subs = where(arr EQ byte(val), nsubs)
Array concatenation such as:
arr = [arr, extra]
is very slow in IDL. It is faster to define an empty array of the correct size and insert the original array along with the new information. However, this is not always possible, especially if the size of the new information is not known in advance.

The IDL "array_equal" function is a fast way to compare data for equality in situations where the index of the elements that differ are not of interest. This operation is much faster than using "total(a NE b)", because it stops the comparison as soon as the first inequality is found, an intermediate array is not created, and only one pass is made through the data. For best speed, ensure that the operands are of the same data type.