Defining ufuncs
using vectorize
You have been able to define your own NumPy ufuncs
for quite some time, but it's a little involved.
You can read through the documentation, the example they post there is a ufunc to perform
It looks like this:
static void double_logit(char **args, npy_intp *dimensions,
npy_intp* steps, void* data)
{
npy_intp i;
npy_intp n = dimensions[0];
char *in = args[0], *out = args[1];
npy_intp in_step = steps[0], out_step = steps[1];
double tmp;
for (i = 0; i < n; i++) {
/*BEGIN main ufunc computation*/
tmp = *(double *)in;
tmp /= 1-tmp;
*((double *)out) = log(tmp);
/*END main ufunc computation*/
in += in_step;
out += out_step;
}
}
And note, that's just for a double
. If you want floats
, long doubles
, etc... you have to write all of those, too. And then create a setup.py
file to install it. And I left out a bunch of boilerplate stuff to set up the import hooks, etc...
Say "thank you" to the NumPy devs
We can use Numba to define ufuncs without all of the pain.
import numpy
import math
Let's define a function that operates on two inputs
def trig(a, b):
return math.sin(a**2) * math.exp(b)
trig(1, 1)
2.2873552871788423
Seems reasonable. However, the math
library only works on scalars. If we try to pass in arrays, we'll get an error.
a = numpy.ones((5,5))
b = numpy.ones((5,5))
trig(a, b)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-6bc62cd8d328> in <module>()
----> 1 trig(a, b)
<ipython-input-2-27083e35e9e9> in trig(a, b)
1 def trig(a, b):
----> 2 return math.sin(a**2) * math.exp(b)
TypeError: only length-1 arrays can be converted to Python scalars
from numba import vectorize
vec_trig = vectorize()(trig)
vec_trig(a, b)
array([[ 2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
[ 2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
[ 2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
[ 2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
[ 2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529]])
And just like that, the scalar function trig
is now a NumPy ufunc
called vec_trig
Note that this is a "Dynamic UFunc" with no signature given.
How does it compare to just using NumPy? Let's check
def numpy_trig(a, b):
return numpy.sin(a**2) * numpy.exp(b)
a = numpy.random.random((1000, 1000))
b = numpy.random.random((1000, 1000))
%timeit vec_trig(a, b)
10 loops, best of 3: 29.4 ms per loop
%timeit numpy_trig(a, b)
10 loops, best of 3: 32.5 ms per loop
What happens if we do specify a signature? Is there a speed boost?
vec_trig = vectorize('float64(float64, float64)')(trig)
%timeit vec_trig(a, b)
10 loops, best of 3: 29.5 ms per loop
No, not really. But(!), if we have a signature, then we can add the target kwarg
.
vec_trig = vectorize('float64(float64, float64)', target='parallel')(trig)
%timeit vec_trig(a, b)
100 loops, best of 3: 6.24 ms per loop
Automatic multicore operations!
Note: target='parallel'
is not always the best option. There is overhead in setting up the threading, so if the individual scalar operations that make up a ufunc
are simple you'll probably get better performance in serial. If the individual operations are more expensive (like trig!) then parallel is (usually) a good option.
Passing multiple signatures
If you use multiple signatures, they have to be listed in order of most specific -> least specific
@vectorize(['int32(int32, int32)',
'int64(int64, int64)',
'float32(float32, float32)',
'float64(float64, float64)'])
def trig(a, b):
return math.sin(a**2) * math.exp(b)
trig(1, 1)
2
trig(1., 1.)
2.2873552871788423
trig.ntypes
4
Exercise: Clipping an array
Yes, NumPy has a clip
ufunc already, but let's pretend it doesn't.
Create a Numba vectorized ufunc that takes a vector a
, a lower limit amin
and an upper limit amax
. It should return the vector a
with all values clipped such that :
# %load snippets/clip.py
a = numpy.random.random((5000))
amin = .2
amax = .6
%timeit vec_truncate_serial(a, amin, amax)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-24-1221b70b3424> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_serial(a, amin, amax)')
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
2156 magic_name, _, magic_arg_s = arg_s.partition(' ')
2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158 return self.run_line_magic(magic_name, magic_arg_s)
2159
2160 #-------------------------------------------------------------------------
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2078 with self.builtin_trap:
-> 2079 result = fn(*args,**kwargs)
2080 return result
2081
<decorator-gen-58> in timeit(self, line, cell)
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
1042 number = 1
1043 for _ in range(1, 10):
-> 1044 time_number = timer.timeit(number)
1045 worst_tuning = max(worst_tuning, time_number / number)
1046 if time_number >= 0.2:
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
137 gc.disable()
138 try:
--> 139 timing = self.inner(it, self.timer)
140 finally:
141 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: name 'vec_truncate_serial' is not defined
%timeit vec_truncate_par(a, amin, amax)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-25-a1fc14977f98> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_par(a, amin, amax)')
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
2156 magic_name, _, magic_arg_s = arg_s.partition(' ')
2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158 return self.run_line_magic(magic_name, magic_arg_s)
2159
2160 #-------------------------------------------------------------------------
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2078 with self.builtin_trap:
-> 2079 result = fn(*args,**kwargs)
2080 return result
2081
<decorator-gen-58> in timeit(self, line, cell)
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
1042 number = 1
1043 for _ in range(1, 10):
-> 1044 time_number = timer.timeit(number)
1045 worst_tuning = max(worst_tuning, time_number / number)
1046 if time_number >= 0.2:
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
137 gc.disable()
138 try:
--> 139 timing = self.inner(it, self.timer)
140 finally:
141 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: name 'vec_truncate_par' is not defined
%timeit numpy.clip(a, amin, amax)
100000 loops, best of 3: 4.78 µs per loop
a = numpy.random.random((100000))
%timeit vec_truncate_serial(a, amin, amax)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-28-1221b70b3424> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_serial(a, amin, amax)')
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
2156 magic_name, _, magic_arg_s = arg_s.partition(' ')
2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158 return self.run_line_magic(magic_name, magic_arg_s)
2159
2160 #-------------------------------------------------------------------------
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2078 with self.builtin_trap:
-> 2079 result = fn(*args,**kwargs)
2080 return result
2081
<decorator-gen-58> in timeit(self, line, cell)
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
1042 number = 1
1043 for _ in range(1, 10):
-> 1044 time_number = timer.timeit(number)
1045 worst_tuning = max(worst_tuning, time_number / number)
1046 if time_number >= 0.2:
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
137 gc.disable()
138 try:
--> 139 timing = self.inner(it, self.timer)
140 finally:
141 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: name 'vec_truncate_serial' is not defined
%timeit vec_truncate_par(a, amin, amax)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-29-a1fc14977f98> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_par(a, amin, amax)')
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
2156 magic_name, _, magic_arg_s = arg_s.partition(' ')
2157 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158 return self.run_line_magic(magic_name, magic_arg_s)
2159
2160 #-------------------------------------------------------------------------
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
2077 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2078 with self.builtin_trap:
-> 2079 result = fn(*args,**kwargs)
2080 return result
2081
<decorator-gen-58> in timeit(self, line, cell)
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
1042 number = 1
1043 for _ in range(1, 10):
-> 1044 time_number = timer.timeit(number)
1045 worst_tuning = max(worst_tuning, time_number / number)
1046 if time_number >= 0.2:
/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
137 gc.disable()
138 try:
--> 139 timing = self.inner(it, self.timer)
140 finally:
141 if gcold:
<magic-timeit> in inner(_it, _timer)
NameError: name 'vec_truncate_par' is not defined
%timeit numpy.clip(a, amin, amax)
1000 loops, best of 3: 212 µs per loop
Exercise: Create logit
ufunc
Recall from above that this is a ufunc which performs this operation:
# %load snippets/logit.py
logit(a)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-32-43208ff16126> in <module>()
----> 1 logit(a)
NameError: name 'logit' is not defined
Performance of vectorize
vs. regular array-wide operations
@vectorize
def discriminant(a, b, c):
return b**2 - 4 * a * c
a = numpy.arange(10000)
b = numpy.arange(10000)
c = numpy.arange(10000)
%timeit discriminant(a, b, c)
The slowest run took 3182.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.7 µs per loop
%timeit b**2 - 4 * a * c
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.4 µs per loop
What's going on?
- Each array operation creates a temporary copy
- Each of these arrays are loaded into and out of cache a whole bunch
del a, b, c