Defining ufuncs using vectorize

You have been able to define your own NumPy ufuncs for quite some time, but it's a little involved.

You can read through the documentation, the example they post there is a ufunc to perform

It looks like this:

static void double_logit(char **args, npy_intp *dimensions,
                            npy_intp* steps, void* data)
{
    npy_intp i;
    npy_intp n = dimensions[0];
    char *in = args[0], *out = args[1];
    npy_intp in_step = steps[0], out_step = steps[1];

    double tmp;

    for (i = 0; i < n; i++) {
        /*BEGIN main ufunc computation*/
        tmp = *(double *)in;
        tmp /= 1-tmp;
        *((double *)out) = log(tmp);
        /*END main ufunc computation*/

        in += in_step;
        out += out_step;
    }
}

And note, that's just for a double. If you want floats, long doubles, etc... you have to write all of those, too. And then create a setup.py file to install it. And I left out a bunch of boilerplate stuff to set up the import hooks, etc...

Say "thank you" to the NumPy devs

We can use Numba to define ufuncs without all of the pain.

import numpy
import math

Let's define a function that operates on two inputs

def trig(a, b):
    return math.sin(a**2) * math.exp(b)
trig(1, 1)
2.2873552871788423

Seems reasonable. However, the math library only works on scalars. If we try to pass in arrays, we'll get an error.

a = numpy.ones((5,5))
b = numpy.ones((5,5))
trig(a, b)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-5-6bc62cd8d328> in <module>()
----> 1 trig(a, b)


<ipython-input-2-27083e35e9e9> in trig(a, b)
      1 def trig(a, b):
----> 2     return math.sin(a**2) * math.exp(b)


TypeError: only length-1 arrays can be converted to Python scalars
from numba import vectorize
vec_trig = vectorize()(trig)
vec_trig(a, b)
array([[ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529]])

And just like that, the scalar function trig is now a NumPy ufunc called vec_trig

Note that this is a "Dynamic UFunc" with no signature given.

How does it compare to just using NumPy? Let's check

def numpy_trig(a, b):
    return numpy.sin(a**2) * numpy.exp(b)
a = numpy.random.random((1000, 1000))
b = numpy.random.random((1000, 1000))
%timeit vec_trig(a, b)
10 loops, best of 3: 29.4 ms per loop
%timeit numpy_trig(a, b)
10 loops, best of 3: 32.5 ms per loop

What happens if we do specify a signature? Is there a speed boost?

vec_trig = vectorize('float64(float64, float64)')(trig)
%timeit vec_trig(a, b)
10 loops, best of 3: 29.5 ms per loop

No, not really. But(!), if we have a signature, then we can add the target kwarg.

vec_trig = vectorize('float64(float64, float64)', target='parallel')(trig)
%timeit vec_trig(a, b)
100 loops, best of 3: 6.24 ms per loop

Automatic multicore operations!

Note: target='parallel' is not always the best option. There is overhead in setting up the threading, so if the individual scalar operations that make up a ufunc are simple you'll probably get better performance in serial. If the individual operations are more expensive (like trig!) then parallel is (usually) a good option.

Passing multiple signatures

If you use multiple signatures, they have to be listed in order of most specific -> least specific

@vectorize(['int32(int32, int32)',
            'int64(int64, int64)',
            'float32(float32, float32)',
            'float64(float64, float64)'])
def trig(a, b):
    return math.sin(a**2) * math.exp(b)
trig(1, 1)
2
trig(1., 1.)
2.2873552871788423
trig.ntypes
4

Exercise: Clipping an array

Yes, NumPy has a clip ufunc already, but let's pretend it doesn't.

Create a Numba vectorized ufunc that takes a vector a, a lower limit amin and an upper limit amax. It should return the vector a with all values clipped such that :

# %load snippets/clip.py
a = numpy.random.random((5000))
amin = .2
amax = .6
%timeit vec_truncate_serial(a, amin, amax)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-24-1221b70b3424> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_serial(a, amin, amax)')


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2156         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2157         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158         return self.run_line_magic(magic_name, magic_arg_s)
   2159 
   2160     #-------------------------------------------------------------------------


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2077                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2078             with self.builtin_trap:
-> 2079                 result = fn(*args,**kwargs)
   2080             return result
   2081


<decorator-gen-58> in timeit(self, line, cell)


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
   1042             number = 1
   1043             for _ in range(1, 10):
-> 1044                 time_number = timer.timeit(number)
   1045                 worst_tuning = max(worst_tuning, time_number / number)
   1046                 if time_number >= 0.2:


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
    137         gc.disable()
    138         try:
--> 139             timing = self.inner(it, self.timer)
    140         finally:
    141             if gcold:


<magic-timeit> in inner(_it, _timer)


NameError: name 'vec_truncate_serial' is not defined
%timeit vec_truncate_par(a, amin, amax)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-25-a1fc14977f98> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_par(a, amin, amax)')


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2156         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2157         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158         return self.run_line_magic(magic_name, magic_arg_s)
   2159 
   2160     #-------------------------------------------------------------------------


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2077                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2078             with self.builtin_trap:
-> 2079                 result = fn(*args,**kwargs)
   2080             return result
   2081


<decorator-gen-58> in timeit(self, line, cell)


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
   1042             number = 1
   1043             for _ in range(1, 10):
-> 1044                 time_number = timer.timeit(number)
   1045                 worst_tuning = max(worst_tuning, time_number / number)
   1046                 if time_number >= 0.2:


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
    137         gc.disable()
    138         try:
--> 139             timing = self.inner(it, self.timer)
    140         finally:
    141             if gcold:


<magic-timeit> in inner(_it, _timer)


NameError: name 'vec_truncate_par' is not defined
%timeit numpy.clip(a, amin, amax)
100000 loops, best of 3: 4.78 µs per loop
a = numpy.random.random((100000))
%timeit vec_truncate_serial(a, amin, amax)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-28-1221b70b3424> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_serial(a, amin, amax)')


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2156         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2157         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158         return self.run_line_magic(magic_name, magic_arg_s)
   2159 
   2160     #-------------------------------------------------------------------------


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2077                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2078             with self.builtin_trap:
-> 2079                 result = fn(*args,**kwargs)
   2080             return result
   2081


<decorator-gen-58> in timeit(self, line, cell)


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
   1042             number = 1
   1043             for _ in range(1, 10):
-> 1044                 time_number = timer.timeit(number)
   1045                 worst_tuning = max(worst_tuning, time_number / number)
   1046                 if time_number >= 0.2:


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
    137         gc.disable()
    138         try:
--> 139             timing = self.inner(it, self.timer)
    140         finally:
    141             if gcold:


<magic-timeit> in inner(_it, _timer)


NameError: name 'vec_truncate_serial' is not defined
%timeit vec_truncate_par(a, amin, amax)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-29-a1fc14977f98> in <module>()
----> 1 get_ipython().magic('timeit vec_truncate_par(a, amin, amax)')


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2156         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2157         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2158         return self.run_line_magic(magic_name, magic_arg_s)
   2159 
   2160     #-------------------------------------------------------------------------


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2077                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2078             with self.builtin_trap:
-> 2079                 result = fn(*args,**kwargs)
   2080             return result
   2081


<decorator-gen-58> in timeit(self, line, cell)


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
   1042             number = 1
   1043             for _ in range(1, 10):
-> 1044                 time_number = timer.timeit(number)
   1045                 worst_tuning = max(worst_tuning, time_number / number)
   1046                 if time_number >= 0.2:


/home/gil/anaconda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
    137         gc.disable()
    138         try:
--> 139             timing = self.inner(it, self.timer)
    140         finally:
    141             if gcold:


<magic-timeit> in inner(_it, _timer)


NameError: name 'vec_truncate_par' is not defined
%timeit numpy.clip(a, amin, amax)
1000 loops, best of 3: 212 µs per loop

Exercise: Create logit ufunc

Recall from above that this is a ufunc which performs this operation:

# %load snippets/logit.py
logit(a)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-32-43208ff16126> in <module>()
----> 1 logit(a)


NameError: name 'logit' is not defined

Performance of vectorize vs. regular array-wide operations

@vectorize
def discriminant(a, b, c):
    return b**2 - 4 * a * c
a = numpy.arange(10000)
b = numpy.arange(10000)
c = numpy.arange(10000)
%timeit discriminant(a, b, c)
The slowest run took 3182.90 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.7 µs per loop
%timeit b**2 - 4 * a * c
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.4 µs per loop

What's going on?

  • Each array operation creates a temporary copy
  • Each of these arrays are loaded into and out of cache a whole bunch
del a, b, c