By Tech Ninja, OpenSource, Analytics & Cloud enthusiast.

A partial list of top engineering technologies, image created by KDnuggets.

Complete curated list of emerging technologies in Data Engineering

  • Abacus AI, enterprise AI with AutoML, similar space to DataRobot.
  • Algorithmia, enterprise MLOps.
  • Amundsen, an open-sourced data discovery and metadata engine.
  • Anodot, monitors all your data in real-time for lightning-fast detection of incidents.
  • Apache Arrow, essential because of non-JVM, in-memory, columnar format and vectorized.
  • Apache Calcite, framework for building SQL databases and data management systems without owning data. Hive, Flink, and others use Calcite.
  • Apache HOP, facilitates all aspects of data and metadata orchestration.
  • Apache Iceberg is an open table format for massive analytic datasets.
  • Apache Pinot, real-time distributed OLAP datastore. Its growth is impressive and it is in a similar space to Druid, but not exactly!
  • Apache Superset, open source BI with many connectors available.
  • Beam, implement batch and streaming data processing jobs that run on any execution engine.
  • Cnvrg, enterprise MLOps.
  • Confluent, Apache Kafka and following ecosystem.
  • Dagster, a data orchestrator for machine learning, very programming-based and in a similar space to Airflow, but emphasizes state flow.
  • DASK, Data Science purely in Python.
  • DataRobot, solid ML platform with a strong focus in enterprise MLOps.
  • Databricks, with new SQL analytics and lakehouse paper, expecting more amazing OSS.
  • DataFrame Whale is a straightforward data discovery tool.
  • Dataiku, enterprise AI/MLOps platform.
  • Delta Lake, ACID on Apache Spark.
  • DVC, open-source version control system for ML projects and desired for MLOps.
  • Feast, open-source feature store, now with Tecton.
  • Fiddler, enterprise explainable AI.
  • Fivetran, data integration pipeline.
  • Getdbt, is hitting the sweet spot of Apache Spark by bringing a simplified SQL-based pipeline.
  • Great Expectations, Data Science testing framework, it’s already amazing!
  • Hopswork, open-sourced MLOps feature store.
  • Hudi brings transactions, record-level updates/deletes, and change streams to data lakes.
  • Koalas, Pandas on Apache Spark.
  • The Kubeflow project is dedicated to making machine learning workflows on Kubernetes that is simple, portable, and scalable.
  • lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data.
  • maiot-ZenML, open-sourced MLOps Framework,…

Continue reading: https://www.kdnuggets.com/2021/09/data-engineering-technologies-2021.html

Source: www.kdnuggets.com