Data frame in Python is a type of tabular data object provided by Pandas module. Its value stored somewhat like a spread sheet , with rows representing each example and columns for variables for each sample. When a Pandas data frame is created, its information and elements can be show using several functions provided by Pandas.
Attribute ‘columns’ will show column or variable labels of the data frame, and the output is a Series object.
#Import Pandas and Numpy module
import pandas as pd
import numpy as np
#create a data frame of 4 examples, 3 variables
df1 = pd.DataFrame(np.arange(12).reshape((4,3)),
index=['r1', 'r2', 'r3','r4'],
columns=['mico', 'mia', 'miaomiao'])
df1
#output
mico mia miaomiao
r1 0 1 2
r2 3 4 5
r3 6 7 8
r4 9 10 11
#show the column labels of data frame
df1.columns
#output
Index(['mico', 'mia', 'miaomiao'], dtype='object')
Similarly, attribute ‘index’ shows the row labels of the data frame, and the result is also a Series.
#show row labels of data frame
df1.index
#Output
Index(['r1', 'r2', 'r3', 'r4'], dtype='object'
Attribute ‘values’ shows the value part of the data frame, and the result is a Numpy array.
#show value part of the data frame
df1.values
#Output , is a Numpy array
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
If you want to select columns of the data frame, you can use column names inside brackets behind the data frame. For selecting a single column, alternatively, you can just add the variable name after the data frame.
#select a single column of data frame
df1['mico']
#Output is a Series
r1 0
r2 3
r3 6
r4 9
Name: mico, dtype: int32
#select multiple columns of a data frame
df1[['mico','miaomiao']]
#output , is a data frame too
mico miaomiao
r1 0 2
r2 3 5
r3 6 8
r4 9 11
#alternative way to select a single column
df1.mico
#output, is a Series
r1 0
r2 3
r3 6
r4 9
Name: mico, dtype: int32
To select rows, just use loc with row labels inside brackets. Alternatively, you can use iloc to select rows in terms of row sequence numbers.
#select example with label 'r1'
df1.loc['r1']
#output, is a Series, with column labels being row indices now
mico 0
mia 1
miaomiao 2
Name: r1, dtype: int32
#select multiple rows of data frame
df1.loc[['r1','r3']]
#output is a data frame
mico mia miaomiao
r1 0 1 2
r3 6 7 8
#use iloc to select rows by index sequence number
#select a single row
df1.iloc[0]
#Output , is a Series
mico 0
mia 1
miaomiao 2
Name: r1, dtype: int32
#select multiple rows
df1.iloc[0:3]
#Output, is a data frame
mico mia miaomiao
r1 0 1 2
r2 3 4 5
r3 6 7 8
0 Comments