Data Preprocessing with Orange Tool

This blog is about data preprocessing using the Orange tool to explore Orange library in python and perform various data preprocessing tasks like Discretization, , Randomization, and Normalization on data with help of various Orange functions.

In the Orange tool canvas, take the Python script from the left panel and double click on it.

Discretization

Data discretization is a method of converting attributes values of continuous data into a finite set of intervals with minimum data loss. In this example We have taken the built in dataset provided by Orange namely iris which classifies the flowers based on their characteristics. For performing discretization Discretize function is used.

Continuization

Given a data table, return a new table in which the discretize attributes are replaced with continuous or removed.

  • binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument zero_based.
  • multinomial variables are treated according to the argument multinomial_treatment.
  • discrete attribute with only one possible value are removed.

Normalization

Normalization is used to scale the data of an attribute so that it falls in a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0. Normalization is generally required when we are dealing with attributes on a different scale, otherwise, it may lead to a dilution in effectiveness of an important equally important attribute(on lower scale) because of other attribute having values on larger scale. We use the Normalize function to perform normalization.

Randomization

With randomization, given a data table, preprocessor returns a new table in which the data is shuffled. Randomize function is used from the Orange library to perform randomization.

So this is all for this blog, we use various preprocessing functions in Orange library for data preprocessing.

So this is all for this blog, we use various preprocessing functions in Orange library for data preprocessing.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Well To Holy Month Of Ramadan Tshirt Design 2

Observability Analysis using Istio and Kiali within Alibaba Cloud Container Service for Kubernetes

[Leetcode] Median of Two Sorted Arrays

Terraform Best Practices

RMI (Remote Method Invocation)

QOI: Quite Ok Image format

Portela — a simple port listener

System Design I — TinyURL

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
PRASHIL VAISHNANI

PRASHIL VAISHNANI

student

More from Medium

Predict Appropriate Ingredients Cosmetics Products Using Data Science

Turning Raw Poultry Farm Data into meaningful insights.

Mastering data scraping in detik.com with python.

Is Pandas the Only Library to Open Datasets?