Friday 29 January 2016

Basics of Numpy & Pandas


Numpy and pandas are very popular packages used for data analysis. this post provide you quick guide to learn the basics of these. We assume you are familiar with vector operation . 

Numpy is mostly written in C & pandas are written over numpy & also uses C programming language as core implementation So these two are fast as compare to general data structure provided with python.

Numpy array is similar to python list but it contain all data of same data-type & it is much faster.
we can also treat numpy array as a vector.

Pandas uses scalar instead of list or numpy array but we can also visualize it as numpy array with some advance functionalities. 

Example

Numpy uses array whereas pandas used scaler
In [2]:
import numpy as np

Array are similar to python list , but it all element must be of same data type, and it faster than list

In [13]:
num = np.array([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num
Out[13]:
array([  3,   4,   2,   5,   7,  23,  56,  23,   7,  23,  89,  43, 676,  43])

Lets see some of functionality

In [17]:
print "Mean :",num.mean()
print "sum :",num.sum()
print "max :",num.max()
print "std :",num.std()
Mean : 71.7142857143
sum : 1004
max : 676
std : 169.340919269
In [18]:
#slicing
num[:5]
Out[18]:
array([3, 4, 2, 5, 7])
In [19]:
#find index of any element let say max
print "index of max :",num.argmax()
index of max : 12
In [21]:
print "data Type of array :",num.dtype
data Type of array : int32

Vector Operation

In [22]:
a=np.array([5,6,15])
b=np.array([5,4,-5])
In [26]:
# Addition
print "{} + {} = {}".format(a,b,a+b) 
[ 5  6 15] + [ 5  4 -5] = [10 10 10]
In [27]:
print "{} * {} = {}".format(a,b,a*b) 
[ 5  6 15] * [ 5  4 -5] = [ 25  24 -75]
In [28]:
print "{} / {} = {}".format(a,b,a/b) 
[ 5  6 15] / [ 5  4 -5] = [ 1  1 -3]
In [34]:
# If size mismatch then error occure
b=np.array([5,4,-5,5])
print "{} + {} = {}".format(a,b,a+b) 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-ca4423c15efb> in <module>()
      1 # If size mismatch then error occure
      2 b=np.array([5,4,-5,5])
----> 3 print "{} + {} = {}".format(a,b,a+b)

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

vector [+-*/] Scaler

In [30]:
print "{} + {} = {}".format(a,3,a+3) 
[ 5  6 15] + 3 = [ 8  9 18]
In [31]:
print "{} * {} = {}".format(a,3,a*3) 
[ 5  6 15] * 3 = [15 18 45]
In [32]:
print "{} / {} = {}".format(a,3,a/3) 
[ 5  6 15] / 3 = [1 2 5]

vector & boolean vector

In [36]:
num=np.array([5,6,15,65,32,656,23,435,2,45,21])
bl=np.array([False,True,True,False,True,False,True,False,True,True,False])
In [37]:
num[6]
Out[37]:
23

num[bl],, what it will return ??


It return array of values corresponding to which elemnt in bl is True

In [40]:
num[bl]
Out[40]:
array([ 6, 15, 32, 23,  2, 45])

find all elemnt greter than 100 from num

In [41]:
num[num>100]
Out[41]:
array([656, 435])

All element less than 50 ??
In [42]:
num[num<50]
Out[42]:
array([ 5,  6, 15, 32, 23,  2, 45, 21])

In-place operation in numpay (Diff between += and +)

In [45]:
a=np.array([5,6,15])
b=a
a += 2
print b
print "this happen becouse a and b both point to same array and += is In-place operation so it maintain that"
[ 7  8 17]
this happen becouse a and b both point to same array and += is In-place operation so it maintain that
In [47]:
a=np.array([5,6,15])
b=a
a = a + 2
print b
[ 5  6 15]
this happen becouse a and b both point to same array and + operation create a new array and then a point to that so b remain unaffected"
In [49]:
a=np.array([5,6,15])
b=a[:3]
b[0]=1000
print a,"Reason is similar as +="
[1000    6   15] Reason is similar as +=

Pandas Series


Basics are same as numpy array but pandas series also contain lots of functionality and speciality

In [51]:
import pandas as pd
In [53]:
num = pd.Series([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num
Out[53]:
0       3
1       4
2       2
3       5
4       7
5      23
6      56
7      23
8       7
9      23
10     89
11     43
12    676
13     43
dtype: int64

See All basic results using describe() function
In [54]:
num.describe()
Out[54]:
count     14.000000
mean      71.714286
std      175.733377
min        2.000000
25%        5.500000
50%       23.000000
75%       43.000000
max      676.000000
dtype: float64

Learn More : Pandas Basics

1 comment:

  1. These are the basic actions for tasks completion. Numpy and pandas are very popular packages used for data analysis. I also used these packages and you can read about my experience at my blog supremedissertations.com . Hope you will like my review.

    ReplyDelete

THANKS FOR UR GREAT COMMENT

Blogger Widgets