Data-Science Series (Part 3)-Introduction to Orange Tool
This blog is all about some basic information about the Orange tool as well as how we can get started with the tool with some basic functionalities provided by the tool.
What is Orange Tool?
Orange is a free and open-source toolkit for data visualization, machine learning, and data mining. It comes with a visual programming front end for exploratory qualitative data analysis and interactive data visualization.
Widgets are orange components that span from simple data presentation, subset selection, and preprocessing to empirical evaluation and predictive modeling of learning algorithms.
Components that read, process, and visualize data make up Orange Workflows. Widgets communicate by sending data along with a channel for communication. One widget’s output is utilized as the input for another. This results in a workflow.
Users can use Orange as a Python library for data manipulation, while visual programming is implemented through an interface in which workflows are formed by linking predefined or user-designed widgets.
Simple Workflow Example
Let’s explore the orange tool with a simple workflow example.
Here you can see that I used the inbuild Iris dataset provided by Orange. The workflow is such that data from the dataset is sent to the data table, to Distributions for creating a distribution and a Scatter Plot is plotted from the dataset.
To create this workflow we load the dataset using the File widget, and then flow between File-Data Info, File-Data Table, File-Distributions and File-Scatter Plot is created.
For the data to be loaded in the Canvas, select the File widget from the left pane and place it in the canvas. Double click on the File widget and select the iris.tab file.
To get the information about the data loaded in the file widget we can create a flow between the File widget and use the Data Info Widget which shows the name, description, row count, column count, features and target values in the dataset in File widget.
Now selecte the Data Table from the ledt panel and drag it to the canvas. we can drag a flow between File and Data Table widget. Now click on the Data Table you can see the data of you dataset in tabuler form. In below image the highlited data is target variable.
Use the Data Distribution widget to get the graphical representation of the dataset values. Here I got the distribution for various features from dataset.
You can observe that for the feature like sepal width the data is not clearly distributed for the target variable category, but on selecting filter based on the petal length the data is distributed properly to three different categories.
We can also use the widget of Scatter Plot for plotting for different kinds of feature pairs. In the below image Scatter Plot is plotted for the feature pair of petal length and petal width.
Here we have used Iris Dataset provided by the Orange Tool but You can upload your data from API in Orange Tool.
Load External Data
To load your data in Orange select the File Widget and from there in you can either select the dataset provided by Orange or else browse to the dataset file in your local machine to load the data.
If you want to load external data use can select the URL option in the File widget, where one can paste the external dataset link to load the data.
That’s it for the introduction part of the orange tool we will explore this tool in detail in the next part of the Data Science series. You can explore more about the Orange tool here.