Open Source Tools For Data Science4 min read

Open Source Tools For Data Science

👆 Click 👏 if you like this article

Open Source Tools For Data Science

Categories of Data Science Tools

  • Data Management :- It is a process of persisting and retrieving the data.
  • Data Integration & Transformation :- It is often known as “ETL” Extract , Transform and Load . It is process of retrieving data from remote management systems.
  • Data Visualization :- It is aprt of initial data exploration process as well as part of final deliverable.
  • Model Building :- It is basically a part of creating machine learning and deep learning models.
  • Model Deployment :- The model which was made in model building is made available to third-party applications.
  • Model Monitoring & assessment :- It ensures good continuous performance by quality checks done on the deployed models.
Source :-

Tools for Data Management

Widely available open source tools for data management are as follows:-

  • Relational Database includes MySQL and PostgreSQL.
  • NoSQL Databases such as MongoDB,Apache,CouchDB etc.
  • FileBased such as hadoop File system or cloud file systems like ceph.

Tools for Data Integration & Transformation

In this phase basically there is a data refining and cleaning i.e basically data warehousing.Most widely open source tools available are :-

  • Apache Airflow
  • KubeFlow
  • Apache Kafka orginated from Linkedin
  • Apache Nifi delivers a very nice visual editor.
  • Apache SparkSQL
  • NodeRED :- It consumes very much little resources so that it can run on a small devices like raspberry Pi.

Tools for Data Visualization

Open souces tools for data visualization are :-

  • Hue which is used to create visualization from SQL Queries.
  • Kibana a data exploration and visualization web application.
  • Apache Superset is also and data exploration and visualization web application.

Tools for Model Deployment

Its an extremely important . Once u are done with making a machine learning model that is able to predict some key aspects of future then that model should convert into an API so that is it will available for every users.Some tools are as follows :-

  • Apache Prediciton IO
  • Seldon its a interesting framework including tenserflow , ApacheSpark , r and sckitLearn.
  • You can deploy to any embedded device like raspberry Pi or a smartphone using TenserFlow Lite.
  • Even to a web application using TenserFlow.JS

Tools for Monitoring & assessment

It is another crucial step you should maintain a track of its predicted performance of new data so to maintain an order of outdated models.

Some of the most widely used tools are :-

  • ModelDB used to manage ML models.
  • Prometheus
  • AI Fairness 360 Open Source ToolKit
  • Adversarial Robustness 360 ToolBox
  • AI Explainiblity 360

Now we will talk about the most used development environment used by data scientist is Jupyter.

Jupyter first emerged as a python programming tool now supports more than 100 languages through “kernels”. A benefits of a jupyter notebook is the ability to unify the documentation.

So these are some of the widely used tools for data science with different catergories.

Click 👏 in the starting of the post if you like this article

Want to get regular updates of Free Courses, Internships & Job Opportunities and Technical Blogs to enhance your knowledge then join Dev Meet Telegram Channel or WhatsApp Group from below links

telegram channel link

Also See:- How to create Server using express & Node.js (For Beginners)

Open Source Tools For Data Science,Categories of Data Science Tools,Open Source Tools For Data Science


Please enter your comment!
Please enter your name here