Lesson3 - Pandas Series Structure

Posted by Omid on Sat, 05 Feb 2022 18:27:52 +0100

1 What is the Series structure?

Series structure, also known as Series sequence, is one of the common data structures used by Pandas. It is a structure similar to one-dimensional arrays, consisting of a set of data values and a set of labels, in which there is a one-to-one correspondence between labels and data values.

Series can hold any data type, such as integer, string, floating point number, Python object, etc. Its label defaults to integer and increments from 0. The structure diagram of the Series is as follows:

    

Tags give us a more intuitive view of where the data is indexed.

2 Series object

2.1 Creating Series objects

Pandas uses the Series() function to create a Series object through which the appropriate methods and properties can be invoked for data processing purposes:

import pandas as pd
s=pd.Series( data, index, dtype, copy)

The parameter descriptions are as follows:

    

You can also use arrays, dictionaries, scalar values, or Python objects to create Series objects. The following shows different ways to create Series objects:

2.1.1 Create an empty Series object

An empty Series object can be created using the following methods, as follows:

import pandas as pd
#Output data is empty
s = pd.Series()
print(s)

The output is as follows:

Series([], dtype: float64)

2.1.2 ndarray Create Series Object

ndarray is an array type in NumPy, and when data is ndarry, the index passed must have the same length as the array. If no parameter is passed to the index parameter, by default, the index value is generated using range(n), where n represents the length of the array, as follows:

[0,1,2,3.... range(len(array))-1]

Create Series Sequence Objects using the default index:

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)

The output is as follows:

0   a
1   b
2   c
3   d
dtype: object

In the example above, no index is passed, so the index is allocated from 0 by default, with an index range of 0 to len(data)-1, or 0 to 3. This setting is called Implicit Indexing.

In addition to the above methods, you can also use the Explicit Index method to define index labels as follows:

import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
#Custom index labels (that is, display indexes)
s = pd.Series(data,index=[100,101,102,103])
print(s)

Output results:

100  a
101  b
102  c
103  d
dtype: object

2.1.3 dict Create Series Object

You can use dict as input data. If no index is passed in, the index is constructed according to the keys of the dictionary; Conversely, when an index is passed, one-to-one correspondence between the index label and the value in the dictionary is required.
The following two sets of examples demonstrate these two scenarios separately.
Example 1, when no index is passed:

import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print(s)

Output results:

a 0.0
b 1.0
c 2.0
dtype: float64

Example 2, when passing an index for the index parameter:

import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print(s)

Output results:

b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Use NaN (non-numeric) padding when the passed index value cannot be found.

2.1.4 Scalar Create Series Object

If the data is a scalar value, an index must be provided, as shown in the following example:

import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)

The output is as follows:

0  5
1  5
2  5
3  5
dtype: int64

Scalar values are repeated and correspond to the number of indices.

3 Access Series data

The above explains many ways to create Series objects, so how should we access elements in a Series sequence? There are two ways, one is location index access; Another is index label access.

3.1 Location Index Access

This access is the same as ndarray and list, accessed using the element's own subscript. We know that the index count of an array starts at 0, which means that the first element is stored at the 0th index position, and so on, each element in the Series sequence can be obtained. Let's look at a simple set of examples:

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0])  #Position Subscript
print(s['a']) #Label Subscript

Output results:

1
1

The data in the Series sequence is accessed sliced as follows:

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[:3])

Output results:

a  1
b  2
c  3
dtype: int64

If you want to get the last three elements, you can also use the following:

import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[-3:])

Output results:

c  3
d  4
e  5
dtype: int64

3.2 Index Label Access

Series is similar to a fixed-size dict in that it treats index labels as key s and element values in series series as values, and then accesses or modifies element values through index labels.
Example 1, accessing a single element value using a cable tag:
import pandas as pd
s = pd.Series([6,7,8,9,10],index = ['a','b','c','d','e'])
print(s['a'])

Output results:

6

Example 2, using index labels to access multiple element values

import pandas as pd
s = pd.Series([6,7,8,9,10],index = ['a','b','c','d','e'])
print(s[['a','c','d']])

Output results:

a    6
c    8
d    9
dtype: int64

Example 3, if a tag not included in the index is used, an exception is triggered:

import pandas as pd
s = pd.Series([6,7,8,9,10],index = ['a','b','c','d','e'])
#Does not contain f value
print(s['f'])

Output results:

......
KeyError: 'f'

4. Series Common Properties

Below we describe the common properties and methods of Series. Common properties of Series objects are listed in the following table.

    

Now create a Series object and show how to use the properties in the table above. As follows:

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print(s)

Output results:

0    0.898097
1    0.730210
2    2.307401
3   -1.723065
4    0.346728
dtype: float64

The row index label for the example above is [0,1,2,3,4].

4.1 axes

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print ("The axes are:")
print(s.axes)

Output Results

The axes are:
[RangeIndex(start=0, stop=5, step=1)]

4.2 dtype

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print ("The dtype is:")
print(s.dtype)

Output results:

The dtype is:
float64

4.3 empty

Returns a Boolean value that determines whether the data object is empty. Examples are as follows:

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print("Is it an empty object?")
print (s.empty)

Output results:

Is it an empty object?
False

4.4 ndim

View the dimensions of the series. By definition, Series is a one-dimensional data structure, so it always returns 1.

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print (s)
print (s.ndim)

Output results:

0    0.311485
1    1.748860
2   -0.022721
3   -0.129223
4   -0.489824
dtype: float64
1

4.5 size

Returns the size (length) of the Series object.

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(3))
print (s)
#series Length Size
print(s.size)

Output results:

0   -1.866261
1   -0.636726
2    0.586037
dtype: float64
3

4.6 values

Returns data from a Series object as an array.

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(6))
print(s)
print("output series Medium data")
print(s.values)

Output results:

0   -0.502100
1    0.696194
2   -0.982063
3    0.416430
4   -1.384514
5    0.444303
dtype: float64
 output series Medium data
[-0.50210028  0.69619407 -0.98206327  0.41642976 -1.38451433  0.44430257]

4.7 index

This property is used to view the range of values of the index in Series. Examples are as follows:

#Display Index
import pandas as pd
s=pd.Series([1,2,5,8],index=['a','b','c','d'])
print(s.index)
#Implicit Index
s1=pd.Series([1,2,5,8])
print(s1.index)

Output results:

Implicit Index:
Index(['a', 'b', 'c', 'd'], dtype='object')
Display Index:
RangeIndex(start=0, stop=4, step=1)

5. Series Common Methods

5.1 head() &tail() view data

If you want to view a portion of the Series data, you can use the head () or tail() method. Where head() returns the first n rows of data and displays the first 5 rows by default. Examples are as follows:

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(5))
print ("The original series is:")
print (s)
#Return the first three rows of data
print (s.head(3))

Output results:

Original series output:
0    1.249679
1    0.636487
2   -0.987621
3    0.999613
4    1.607751
head(3)Output:
dtype: float64
0    1.249679
1    0.636487
2   -0.987621
dtype: float64

tail() returns the last n rows of data, defaulting to the last 5 rows. Examples are as follows:

import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(4))
#primary series
print(s)
#Last two rows of output data
print (s.tail(2))

Output results:

primary Series Output:
0    0.053340
1    2.165836
2   -0.719175
3   -0.035178
 The last two rows of data are output:
dtype: float64
2   -0.719175
3   -0.035178
dtype: float64

5.2 isnull() &nonull() to detect missing values

isnull() and nonull() are used to detect missing values in Series. The so-called missing value, as its name implies, means that the value does not exist, is lost or is missing.

  • isnull(): Returns True if the value does not exist or is missing.
  • notnull(): Returns False if the value does not exist or is missing.

In fact, it is not difficult to understand that data collection often goes through a cumbersome process in the real data analysis stuff. In this process, it is unavoidable that some force majeure or human factors will cause data loss. At this time, we can use the appropriate methods to deal with missing values, such as mean interpolation, data completion, and so on. These two methods are designed to help us detect the presence of missing values. Examples are as follows:

import pandas as pd
#None Represents missing data
s=pd.Series([1,2,5,None])
print(pd.isnull(s))  #Is Null Return True
print(pd.notnull(s)) #Null Value Return False

Output results:

0    False
1    False
2    False
3     True
dtype: bool

notnull():
0     True
1     True
2     True
3    False
dtype: bool