Numpy and pandas are very popular packages used for data analysis. this post provide you quick guide to learn the basics of these. We assume you are familiar with vector operation .
Numpy is mostly written in C & pandas are written over numpy & also uses C programming language as core implementation So these two are fast as compare to general data structure provided with python.
Numpy array is similar to python list but it contain all data of same data-type & it is much faster.
we can also treat numpy array as a vector.
Pandas uses scalar instead of list or numpy array but we can also visualize it as numpy array with some advance functionalities.
Example
Numpy uses array whereas pandas used scaler
In [2]:
import numpy as np
Array are similar to python list , but it all element must be of same data type, and it faster than list¶
In [13]:
num = np.array([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num
Out[13]:
Lets see some of functionality¶
In [17]:
print "Mean :",num.mean()
print "sum :",num.sum()
print "max :",num.max()
print "std :",num.std()
In [18]:
#slicing
num[:5]
Out[18]:
In [19]:
#find index of any element let say max
print "index of max :",num.argmax()
In [21]:
print "data Type of array :",num.dtype
Vector Operation¶
In [22]:
a=np.array([5,6,15])
b=np.array([5,4,-5])
In [26]:
# Addition
print "{} + {} = {}".format(a,b,a+b)
In [27]:
print "{} * {} = {}".format(a,b,a*b)
In [28]:
print "{} / {} = {}".format(a,b,a/b)
In [34]:
# If size mismatch then error occure
b=np.array([5,4,-5,5])
print "{} + {} = {}".format(a,b,a+b)
vector [+-*/] Scaler¶
In [30]:
print "{} + {} = {}".format(a,3,a+3)
In [31]:
print "{} * {} = {}".format(a,3,a*3)
In [32]:
print "{} / {} = {}".format(a,3,a/3)
vector & boolean vector¶
In [36]:
num=np.array([5,6,15,65,32,656,23,435,2,45,21])
bl=np.array([False,True,True,False,True,False,True,False,True,True,False])
In [37]:
num[6]
Out[37]:
num[bl],, what it will return ??¶
It return array of values corresponding to which elemnt in bl is True
In [40]:
num[bl]
Out[40]:
find all elemnt greter than 100 from num¶
In [41]:
num[num>100]
Out[41]:
All element less than 50 ??
In [42]:
num[num<50]
Out[42]:
In-place operation in numpay (Diff between += and +)¶
In [45]:
a=np.array([5,6,15])
b=a
a += 2
print b
print "this happen becouse a and b both point to same array and += is In-place operation so it maintain that"
In [47]:
a=np.array([5,6,15])
b=a
a = a + 2
print b
this happen becouse a and b both point to same array and + operation create a new array and then a point to that so b remain unaffected"
In [49]:
a=np.array([5,6,15])
b=a[:3]
b[0]=1000
print a,"Reason is similar as +="
Pandas Series¶
Basics are same as numpy array but pandas series also contain lots of functionality and speciality
In [51]:
import pandas as pd
In [53]:
num = pd.Series([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
num
Out[53]:
See All basic results using describe() function
In [54]:
num.describe()
Out[54]:
Learn More : Pandas Basics
These are the basic actions for tasks completion. Numpy and pandas are very popular packages used for data analysis. I also used these packages and you can read about my experience at my blog supremedissertations.com . Hope you will like my review.
ReplyDelete