We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !
Data frames are the most widely used data structures in R programming. Unlike each element in vector/matrix/array must have same data mode, a data frame can store data elements with different mode or type in one object. For example, a data frame of family information can have numeric (e.g. age, income), character (e.g. name), and logical (work/not work) data types. Data frames in R act somewhat similar as a spredsheet in Microsoft Excel, where each row represents each observation or subject and each column refers to each variable or attribute.
- Creating a data frame using data.frame() function
A data frame can be simply created using data.frame() functioun when each column of the data frame is existent in the current R working session. Following code shows an example of creating a personal health data frame using data.frame(), after information for each corresponding column vector has been created.
# Creating a data frame using data.frame()
#two numeric vectors
> ID <- c(3, 4, 5, 6)
> age <- c(15, 14, 18, 12)d
#a character vector,nominal
> blood <- c("Type5", "Type6", "Type5", "Type6")
#a vector of character type, ordinal
> status <- c("Poor", "Improved", "Excellent", "Poor")
> pdata <- data.frame(ID, age, blood, status)
> pdata
ID age blood status
1 3 15 Type5 Poor
2 4 14 Type6 Improved
3 5 18 Type5 Excellent
4 6 12 Type6 Poor
2. Specifying elements of a data frame
Elements of a data frame can be specified using several methods. You can just use axis subscripts for identifying the elements of a data frame , similar as dealing with a matrix. You can also include column variable names in indexing when returning parts of a data frame. $ symbol can also be used when a particular column of a data frame is indexed.
# Identifying elements of a data frame
#first and second variable of data frame
> pdata[1:2]
ID age
1 3 15
2 4 14
3 5 18
4 6 12
#varaible of age of data frame
> pdata$age
[1] 15 14 18 12
#variable of "blood" and "status" of data frame
> pdata[c("blood", "status")]
blood status
1 Type5 Poor
2 Type6 Improved
3 Type5 Excellent
4 Type6 Poor
#show information of data frame
> str(pdata)
'data.frame': 4 obs. of 4 variables:
$ ID : num 3 4 5 6
$ age : num 15 14 18 12
$ blood : chr "Type5" "Type6" "Type5" "Type6"
$ status: chr "Poor" "Improved" "Excellent" "Poor"
#select 2nd variable of all observations
> pdata[,2]
[1] 15 14 18 12
#select all variables of 1st observation
> pdata[1,] #select all variables of 1st observation
ID age blood status
1 3 15 Type5 Poor
#using $ symbol to index a particular column
> pdata$blood
[1] "Type5" "Type6" "Type5" "Type6"
You can also watch the corresponding video on our YouTube channel of using data.frame().
0 Comments