Hello everyone, thanks for coming back to the next tutorial in Data Preprocessing step of Machine Learning tutorials.
Just to refresh your memory, in the last tutorial i.e. Part 1 of Data Preprocessing, we saw how to download the dataset and import the required libraries for performing required operations. In this tutorial, we are going to see how to import this downloaded data in both Python and R.
As you can see from the above infographics, we are looking at the Data Import section of the Data Preprocessing in Machine Learning.
Let us start then.
- IMPORTING DATASET IN PYTHON
Before importing the downloaded dataset in the last tutorial in Python Spyder IDE, we need to make sure we set the working directory for the Spyder IDE.
You can do this by clicking on the File Explorer option. It is available on the Top Right window of the Spyder IDE. You can save the currently used Python file in the folder where you saved yourCSV file and once you do that, the corresponding folder will be set as the working directory for the Spyder IDE session.
The following screenshot might be able to help you out with doing this.
As you can see from the above screenshot, “C:\Users\User\Desktop\blog\3 ML Data Preprocessing” is the working directory for this tutorial. This directory contains ourinput file along with the Python file(s).
Once you do this, you can import the downloadedfile in Python with the help of pandas library. We have already imported the pandas library in Python in the last tutorial, therefore we need to make use of it to import it into the Spyder IDE session.
You can use the following command to import this dataset into Python.
datasets = pd.read_csv(‘’)
From the above command, we can say that the data stored inwill be imported into datasets variable. The variable pd is an alias for the pandas library. The following screenshot shows the execution of the above command.
Now, once you do that, you will be able to see the datasets variable in the Variable Explorer window on the right side of the Spyder IDE, as shown in the above screenshot.
If you double-click on the datasets variable, a new pop-up window will appear and will show you the data stored in the datasets variable.
For this example, Salary data will be shown in the Float or Decimal format. You can click on the Format button and change the format, you will be able to see the difference. I have changed the format from the float type to integer type.
Now, the next step should be to create Metrix of Features and Dependent variable.
You must know that Python has zero-level indexing i.e. Indexing starts with 0 in python, therefore we will start the indexing process from 0 till the penultimate feature.
The following command is used for performing this operation.
X =[:, :-1].values
To give an idea about the above command, the first colon (:) indicates that all the rows of the data should be imported and the second colon (:) followed by -1 indicates that all the columns except the last one should be imported.
The .values option tells Python to import the values stored in those rows and columns and finally, the output should be stored in X variable.
This imports the Feature vector into X. Now is the time to import the dependent variable i.e. Output Variable in Y.
For doing this, we use the following command.
Y =[:, 3].values
The explanation of the above command goes similar to the last one. All the rows and the last column i.e. Purchased, is included in the Y variable.
The following screenshot shows the execution the above commands.
As you can see, both X and Y variables were created successfully from the dataset variable. X has 10 rows and 3 columns whereas Y has 10 rows with 1 column.
This completes the Data Import process in Python.
Now, let us look the same in R.
- IMPORTING DATASET IN R
Believe me, doing the same Data Import process in R is way easier as compared to in Python.
The first thing to do is to set the working directory. For that, we use the setwd() function.
Please use the following command to set the working directory.
setwd(“C:\\Users\\User\\Desktop\\blog\\3 ML Data Preprocessing”)
The execution of the above command looks as follows.
You can change the path because it will be different for your system. Once, the above command is executed, you can run the following command to import thefile into a vector called dataset.
Please notice that we are using() function to import the CSV file into the dataset vector. To confirm, you need to view the imported dataset variable. For doing this, we use the View()function.
You can use the following command to view the dataset variable.
The output of the above three commands looks as follows.
This completes the Data Import process in R.
We can conclude this tutorial here. I hope this helps to understand the basic concepts when it comes to Machine Learning.