In [1]:
%matplotlib inline #this is a formatting statement for jupyter notebooks 

Transitioning from using IDL to using Python for your science

This is an introduction to moving your science from an IDL environment to Python. If you havne't done so yet, please download and install the the latest versions of anaconda (www.continuum.io/downloads) and sunpy (docs.sunpy.org/en/stable/guide/installation/).

Let's make sure that your instillation of anaconda is up to date. From the terminal, type:

conda update conda
conda update anaconda

Now, lets launch an ipython (interactive python) session

ipython

Getting started with Python

The first thing to note is that by itself, Python is just a platform. You can think of Python as an engine - it is very powerful but unless you connect it to a transmission, axle, and wheels, it isn't very effective at getting you down the road.

Fortunatly in most cases you don't need to reinvent the wheel and you can just find some that someone else created that will fit your needs.

The Anaconda package comes with literally thousands of tools to help you work with data.

Let's explore some basic tools. NumPy is the fundamental package for scientific computing with Python. Let's load it in to our current session.

In [2]:
import numpy as np

Notice the syntax we are using: import "the library" as "what we want to call it." Now whenever we want to call a tool in numpy, we can use "np."

Next we will want to plot our data, so lets grab some plotting tools. We will use a package called matplotlib. Matplotlib is a 2D plotting library, which produces all kinds figures. We don't need everything that matplotlib can do, so let's import just the tool that produces line plots. This tool is called pyplot.

In [3]:
import matplotlib.pyplot as plt

Again notice the syntax we are using: import "library.tool" as "what we want to call it."

Okay, we have the tools 'np' and 'plt' defined. Loading in the tools and libraries you want to use is something you have to do every time you write a program or load a python session.

Now lets put our tools to use:

In [5]:
dt = 0.1
Fs = 1/dt
t=np.arange(0,Fs,dt)

We can dynamically define variables (i.e. we don't have to specify the type) and use conventional assignments to make new variables.

We also use a tool within np to create t: np.arange. Np.arange is similar to IDL's FINDGEN( ). It creates an evenly spaced array with the syntax of output=np.arange(start,stop,step).

So what does t look like? We just type the variable or use a print statement:

In [6]:
t
Out[6]:
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ,
        1.1,  1.2,  1.3,  1.4,  1.5,  1.6,  1.7,  1.8,  1.9,  2. ,  2.1,
        2.2,  2.3,  2.4,  2.5,  2.6,  2.7,  2.8,  2.9,  3. ,  3.1,  3.2,
        3.3,  3.4,  3.5,  3.6,  3.7,  3.8,  3.9,  4. ,  4.1,  4.2,  4.3,
        4.4,  4.5,  4.6,  4.7,  4.8,  4.9,  5. ,  5.1,  5.2,  5.3,  5.4,
        5.5,  5.6,  5.7,  5.8,  5.9,  6. ,  6.1,  6.2,  6.3,  6.4,  6.5,
        6.6,  6.7,  6.8,  6.9,  7. ,  7.1,  7.2,  7.3,  7.4,  7.5,  7.6,
        7.7,  7.8,  7.9,  8. ,  8.1,  8.2,  8.3,  8.4,  8.5,  8.6,  8.7,
        8.8,  8.9,  9. ,  9.1,  9.2,  9.3,  9.4,  9.5,  9.6,  9.7,  9.8,
        9.9])
In [7]:
print(t)
[ 0.   0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3  1.4
  1.5  1.6  1.7  1.8  1.9  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9
  3.   3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1  4.2  4.3  4.4
  4.5  4.6  4.7  4.8  4.9  5.   5.1  5.2  5.3  5.4  5.5  5.6  5.7  5.8  5.9
  6.   6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9  7.   7.1  7.2  7.3  7.4
  7.5  7.6  7.7  7.8  7.9  8.   8.1  8.2  8.3  8.4  8.5  8.6  8.7  8.8  8.9
  9.   9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9]

But what are the dimensions of t? How many elements are in it?

In IDL we would have to call a function to return this information, such as MAX( ). In Python, since it is an object oriented programming language, the variable t is actually more than just a simple array. It is self-aware! So let's ask t about itself.

In [8]:
t.min
Out[8]:
<function min>

Um... what just happened here? Python just said "yep, that is a valid function of t" but didn't actually return the value. To get the value out, we need to use a different syntax:

In [9]:
t.min()
Out[9]:
0.0

Also we can ask:

In [10]:
t.max()
Out[10]:
9.9000000000000004
In [11]:
t.mean()
Out[11]:
4.9500000000000002
In [12]:
t.var()
Out[12]:
8.3325000000000014

These are all "t.something( )" - "something( )" is a function of the object "t". To find out how many elements are the array

In [13]:
t.size
Out[13]:
100

To see the shape of the array:

In [14]:
t.shape
Out[14]:
(100,)

If you want to see all of the functions already available in t, type "t." and then 'tab'. This will show you dynamically all of the functions that exist. Go ahead, try it!

You'll note that .size and .shape don't have brackets after them, but .min and .var do. This is because the size and the shape of the array are attributes of the array. The reason .min (and other methods) have brackets is that these functions have different options that you may want to use.

Let's create a 2d random array

In [15]:
rand_nn = np.random.randn(t.size, t.size)

We can find the minimum values along one dimension of the array...

In [34]:
rand_nn.min(axis=0)
Out[34]:
array([-2.89518052, -2.38851405, -2.65438088, -2.42616205, -2.98637711,
       -2.61917924, -3.04400683, -1.99551291, -2.54481928, -2.75898872,
       -2.00846906, -2.82244842, -1.55895129, -3.62905027, -2.15917153,
       -2.27459189, -2.18018518, -3.02437466, -2.42973237, -2.77289929,
       -2.61672753, -2.00925514, -2.61912512, -2.58307325, -2.45159215,
       -2.53986963, -2.56444877, -1.82938401, -2.29653528, -2.55571806,
       -2.94060788, -2.46487206, -2.05620166, -2.37845073, -2.43397154,
       -2.14122798, -3.25831335, -2.2645945 , -2.41222808, -2.84049392,
       -4.08426662, -2.80068056, -3.2978516 , -2.49876356, -2.60339569,
       -2.02333597, -2.80719912, -2.09827113, -2.52161858, -2.48798412,
       -2.11527522, -2.04119045, -2.11583545, -2.58030957, -2.03842926,
       -2.63240217, -2.38451646, -2.13098785, -2.81934984, -2.13577515,
       -2.48779106, -2.38749245, -2.46409163, -3.27733676, -2.60068506,
       -2.67887403, -2.04875584, -2.54092582, -2.37890464, -1.89240065,
       -2.19670845, -2.55542198, -2.5086707 , -2.12344642, -2.06953826,
       -2.00131686, -2.17518508, -2.82904241, -2.58758977, -2.43779972,
       -2.51828533, -2.18251377, -2.35536258, -2.49192285, -2.78535396,
       -2.20373031, -2.50341687, -2.18704082, -2.2692945 , -1.94611425,
       -2.24584646, -2.44274415, -2.44158504, -2.35887649, -2.79038533,
       -2.11925456, -3.33111796, -1.94091676, -2.15708445, -2.88000045])

or the other one...

In [35]:
rand_nn.min(axis=1)
Out[35]:
array([-2.06035846, -2.42968466, -2.14733083, -2.41222808, -2.94060788,
       -2.09774047, -3.27733676, -2.46409163, -2.81169946, -2.53165595,
       -3.33111796, -2.00255171, -2.53416261, -2.00131686, -1.86039766,
       -1.79499492, -2.6139758 , -2.80068056, -2.77289929, -2.45927262,
       -2.43397154, -2.02778759, -2.56444877, -2.4963192 , -2.42973237,
       -2.05636977, -2.61672753, -1.80562577, -2.80719912, -2.82904241,
       -2.50341687, -1.87812105, -2.74139925, -2.44158504, -3.02437466,
       -3.62905027, -2.63240217, -2.43325322, -2.42289984, -2.61917924,
       -2.61912512, -2.79038533, -2.2645945 , -2.54092582, -2.88000045,
       -3.2978516 , -1.88906553, -2.78535396, -2.38709958, -2.35887649,
       -2.58758977, -2.29667604, -2.38851405, -2.96679138, -2.38125347,
       -2.60386995, -2.45159215, -2.55542198, -2.68562536, -1.90493965,
       -2.11583545, -2.18248954, -2.60339569, -2.29653528, -2.74144036,
       -2.82244842, -2.13098785, -2.38451646, -3.04400683, -1.93908917,
       -2.89518052, -2.67887403, -2.77061606, -2.15917153, -2.43857984,
       -2.05672415, -2.31948536, -2.32891487, -2.43779972, -2.54481928,
       -2.98637711, -2.37890464, -2.34942342, -2.58030957, -2.33437098,
       -2.30671029, -2.55571806, -2.60193192, -2.24663844, -3.25831335,
       -2.65438088, -2.18251377, -2.40651613, -2.50890559, -2.30407762,
       -2.29585467, -2.1856842 , -2.84049392, -4.08426662, -2.41878524])

Or the whole array.

In [36]:
rand_nn.min(), rand_nn.min(axis=None)
Out[36]:
(-4.0842666160402041, -4.0842666160402041)

The min( ) function with the axis option finds the minimum value along different dimensions of the array rand_nn.

We used "axis=None". There is a special value in Python called None, which indicates no value. In this case, this means that axis has no assigned value. The axis function then resorts to a default behavior, which is to give you the minimum value over the entire array.

Now let's create an array of random numbers with the same number of elements as t. We will use the Numpy tool 'random', but there are lots of ways to generate random numbers. In IDL we might use RANDOMN( ) or RANDOMU( ) depending on the distribution we want to sample from. Numpy.random had many ways of calculating random numbers too.

Try typing "np.random." and then 'tab'.

Let's sample from a normal distribution:

In [37]:
rand_n = np.random.randn(t.size)

Notice that we can use the t.size attribute of 't' without having to define another variable or call a separate function.

Let's plot our variables. To do this we will use the matplotlib's plot function.

In [38]:
plt.plot(t,rand_n)
plt.show()

Matplotlib generates a plot and then in a separate command you have to display it to the screen. Yeah, it's a pain but you get use to it.

Let's now generate some more complex data:

In [39]:
r = np.exp(-t/0.05)
r_dat = np.convolve(rand_n, r)*dt
r_dat.size
Out[39]:
199

The convolution returned an array that is twice the size of our other working data sets. We should trim that down. The syntax for segmenting an array uses square brackets:

In [22]:
r_dat=r_dat[:t.size]

This means we want r_dat from the beginning element to t.size (which is 100). Let's create one more data set to work with using some more of NumPy's functions (sin and pi).

In [23]:
ss = 0.1*np.sin(2*np.pi*t) + r_dat

What does all of this generated data look like? Why don't we make five different plots to highlight some different plotting capabilities.

First, we will define a plotting space, but instead of showing the plot this time, we are going to plot some more values. First, we will plot the magnitude spectrum of the generated data with a linear scale and a dB (logarithmic) scale. This can all be done within matplotlib. Next, We can easily show the wrapped and unwrapped phase spectrum.

The 'subplot' command means that we want a 3 by 2 grid of plots and we want to operate on the first sextant.

In [24]:
plt.subplot(3, 2, 1)
plt.plot(t, ss)

plt.subplot(3, 2, 3)
plt.magnitude_spectrum(ss, Fs=Fs)

plt.subplot(3, 2, 4)
plt.magnitude_spectrum(ss, Fs=Fs, scale='dB')

plt.subplot(3, 2, 5)
plt.angle_spectrum(ss, Fs=Fs)

plt.subplot(3, 2, 6)
plt.phase_spectrum(ss, Fs=Fs)
plt.show()

Using matplotlib, we can showcase our generated data set in several different ways. Of course this is just an example and there are many more types of plots that matplotlib can generate.

Basic Programming Techniques in Python

To showcase some other basic programming techniques using the skills we just learned, lets analytically calculate $\pi$.

The easiest way of doing this is using the Monty Carlo Method through finding random points inside a circle inscribed in a square.

In [25]:
import numpy as np #always good to be explicit about our libraries used

Next, we will initialize our counting variables

In [26]:
total = 0
inside = 0

Now we will set up a for-loop to calculate our points. Loops in Python are a bit different than in IDL. First, Notice that in Python loops are tab delimited. Unlike IDL there is no need for 'begin' and 'end' statements since the tabbing indicates where the loop starts and ends.

In [27]:
for ii in range(10000):
    
    # Generate two coordinates in the interval 0-1:
    x_coord = np.random.uniform()
    y_coord = np.random.uniform()
    
    # Calculate the distance to the origin
    r = np.sqrt(x_coord**2 + y_coord**2)
    
    # Count this if it is inside the circle.
    if r< 1:
        inside += 1
    
    total += 1

You will see somethig else too: the variable we are iterating over, 'ii', is nowhere to be found in the loop. This is because the iteration of 'ii' over the list that the range( ) function produces is implicit. We can make the interation explicit using a slightly different syntax.

In [28]:
print('Pi=',4.0*inside/(1.0*total))
('Pi=', 3.1336)

In this case, 'iteration' now contains the iteration number (0,1,...,9999) while ii contains what ever is in the list at that iteration. This could be any data type. To drive this point home, let's show a trivial example:

In [33]:
for value in 'Jack':
    print(value)
print 
for iteration, value in enumerate('Jack'):
    print(iteration, value)
J
a
c
k

(0, 'J')
(1, 'a')
(2, 'c')
(3, 'k')

A few other things to notice: we are using np.random again, as well as np.sqrt. To define an exponent python uses a double astrix. We also encounter an if statement. For conditionals, python drops the 'then' from the statement and uses tabbing to denote the begining and ending.

Writing A Function

Lets take this same python script and make it into a simple function we can iterate over.

In [30]:
import numpy as np #always good to be explicit about our libraries used
In [31]:
def pitest():
    # Create the coordinate
    xtemp = np.random.uniform()
    ytemp = np.random.uniform()
    
    # Calculate the distance to the origin
    r = np.sqrt(xtemp**2 + ytemp**2)

    # Count this if it is inside the circle.
    if r< 1:
        return True
  
    return False

The Python syntax for functions use the same tabbing to start and end the function that loops and conditional statemetns do. This particular function pitest( ) does not have any variable input, but if it did, it would go in the parenthesis. Now let's put the new function in a loop and see how it works:

In [32]:
count = 0
inside = 0

for i in range(10000):
    is_inside = pitest()
    
    inside += is_inside
    count += 1.
    
print('Pi=',((inside/count)*4.))
('Pi=', 3.1228)

Python has boolean data types, True and False. These are also understood to have the integer values 1 and 0 respectively.

Running Your Own Function

So now I have writen a script, how do I run it from the command line? Let's say that the function we wrote, 'pitest,' is saved with the name 'Pifunction.py'. First, change to the directory where your program is located.

cd /Users/mkirk/code/python/

Then import the name of your script.

import Pifunction as Pifunc

And then run your function like we have all along:

Pifunc.pitest()
In [ ]:
 
In [ ]: