R Programming

How to create factor variables in R programming

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Categorical variables, including nominal and ordinal variables in R programming language are called factor variables. For example, gender(male/female) is nominal, and survey results (excellent, good, normal, bad) have ordinal values. Categorical variables are useful because many data analysis operations are related to values in different categories, such as contingency tables between two categorical variables for independence analysis, hypothesis testing of homogeneity of variances, just name a few. So usually, the values of those categories must have to be set as factor type in R before the analysis can go further.

In R language, function factor() is used to set or create variable/vector as factor type. The basic form of the function is

factor(var, order)

Where var is a vector, and order is set to TRUE if the values in the vector are ordered.

The following example shows creation of a data frame, in which variable ‘blood’ is an unordered factor variable, and variable ‘status’ is an ordered variable.

#create variable vectors
ID <- c(3, 4, 5, 6) 
age <- c(15, 14, 18, 12)
blood <- c("Type5", "Type6", "Type5", "Type6")
status <- c("Poor", "Improved", "Excellent", "Poor")

# to set vector ‘blood’ as an factor without order
blood <- factor(blood)

# to set vector ‘status’ as ordered factor
status <- factor(status, order = TRUE)

# to create a data frame
pdata <- data.frame(ID, age, blood, status)   

#to show structure of data frame 
str(pdata)  
'data.frame': 4 obs. of  4 variables:
 $ ID    : num  3 4 5 6
 $ age   : num  15 14 18 12
 $ blood : Factor w/ 2 levels "Type5","Type6": 1 2 1 2
 $ status: Ord.factor w/ 3 levels "Excellent"<"Improved"<..: 3 2 1 3 

#get summary information of data frame
summary(pdata)

> summary(pdata)
       ID            age          blood         status 
 Min.   :3.00   Min.   :12.00   Type5:2   Excellent:1  
 1st Qu.:3.75   1st Qu.:13.50   Type6:2   Improved :1  
 Median :4.50   Median :14.50             Poor     :2  
 Mean   :4.50   Mean   :14.75                          
 3rd Qu.:5.25   3rd Qu.:15.75                          
 Max.   :6.00   Max.   :18.00                          
> 

You can also watch full video for R fundamental course from our YouTube channel.

wilsonzhang746

Recent Posts

Download R Course source files

Click here to download R Course source files !

2 months ago

Download Python Course source files

Click here to download Python Course Source Files !

2 months ago

How to create a data frame from nested dictionary with Pandas in Python

For online Python training registration, click here ! Pandas provides flexible ways of generating data…

5 months ago

How to delete columns of a data frame in Python

For online Python training registration, click here ! Data frame is the tabular data object…

5 months ago

Using isin() to check membership of a data frame in Python

Click her for course registration ! When a data frame in Python is created via…

5 months ago

How to assign values to Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental…

5 months ago