Data as a Service Platform
- https://www.dremio.com/
- Opensource (Community edition / Enterprise edition)
- License : Apache 2.0
- Main Language : Java
- 참고문서
- Document Site
- 사용 Opensource
Apache Arrow (in-memory analytics)
Apache Drill (SQL execution engine )
Apache Parquet (optimization for analytics Data Type)
Apache Calcite ( SQL Parser)
React ( front-end Javascript framework)
Apache Zookeeper ( Distributed Key-value Store)
RocksDB (embedded Key-value store)
Ansible ( CI/CD automation framework )
git ( Source Control - github)
Cerrit ( code Collaboration tool - Core-review )
기타 정보
- Headquarters Regions
San Francisco Bay Area, Silicon Valley, West Coast - Founded Date
Jun 9, 2015 - Founders
Jacques Nadeau, Tomer Shiran - Funding Status
Early Stage Venture - Number of Employees
11-50
Dremio provides a quantum leap in performance, based on four areas of innovation.
Apache Arrow Execution
From 1 to 1000+ nodes, architected for cloud deployments: elastic compute, runs on object stores.
Data Reflections™
Accelerate data and queries automatically, up to 1000x faster, with the full power of relational algebra.
Native Push-Downs
Optimized query semantics for each data source – Amazon S3, ADLS, RDBMS, NoSQL, HDFS, and more.
Vertically Integrated Query Engine
Cost-based query planner automatically generates query plans to make optimal use of Data Reflections™ and push downs.
◎ 주요 Feature
- Data Acceleration. Using columnar, compressed Apache Arrow for efficient in-memory analytical processing, and Apache Parquet for persistence of source data that is optimized for one or more query workloads through partitioning, sorting, aggregations, projections, and distributions.
- Data Catalog searchable index of your data source metadata, as well as virtual datasets created by Dremio users.
- Integrated Data Curation. Through a powerful and intuitive GUI, easy for business users, yet sufficiently powerful for your data engineers, and fully integrated into Dremio.
- Push-Downs On Any Data Source. Including optimized push downs and parallel connectivity to relational databases, non-relational systems like MongoDB, Elasticsearch, as well as S3 and HDFS.
- Cross-Data Source Joins execute high-performance joins across multiple disparate systems and technologies, between relational and NoSQL, S3, HDFS, and more.
- Data Lineage. Full visibility into your data lineage, from your data sources, through transformations, joining with other data sources, and sharing with other users.
Terminology & Concepts
- Data Reflections™
- Physically optimized representations of source data that both offload operational systems, and optimize one or more analytical workloads. Reflections are transparent to end users, and automatically substituted by Dremio’s query planner. Reflections have a configurable TTL SLA, so you can trade off freshness and query latency.
- Data Catalog
- An index of source metadata, including the names of tables, views, columns, fields, collections, indexes and more. Users can easily issue Google-sytle searches to find datasets for a given job. Data Catalog includes all metadata from virtual datasets as well.
- Data Curation
- A visual and intuitive way for analysts, data scientists, and data engineers to transform data for the needs of a particular job, without making copies of the data.
- Data Lineage
- As data is used for multiple jobs, it is transformed, joined, and shared with other users, forming an implicit graph of relationships and dependencies. These relationships help to understand data use, and relationships that are essential for security, governance, and remediation.
- Recommendations
- As users interact with datasets, their behavior can serve as the basis for recommendations to other users, helping to build joins and transformations more easily.
- Apache Arrow-Based Execution
- Apache Arrow is a columnar standard for in-memory analytics. It provides significant advantages in terms of memory and CPU efficiency, and is designed to work well with GPUs and FPGAs.
- .
Feature Comparison
Dremio | SQL Execution Engines | |
---|---|---|
Dremio | SQL Execution Engines | |
Scale-out architecture | Yes | Yes |
Accelerates aggregation queries | YesQueries are written against the logical schema, and Dremio's query planner automatically rewrites the query to use Aggregation Reflections, invisible to the end user. | NoRequires a slow full table scan each time. |
Accelerates ad-hoc queries | YesQueries are written against the logical schema, and Dremio's query planner automatically rewrites the query to use Raw Reflections, invisible to the end user. | NoRequires a slow full table scan each time. |
Accelerates relational data sources | YesDremio Reflections, and native optimizers with first class push downs of queries | NoVaries by engine, but most require third party ETL to move and prep data for HDFS or S3 |
Accelerates NoSQL data sources | YesDremio Reflections, and native optimizers with first class push downs of queries | NoVaries by engine, but most require third party ETL to move and prep data for HDFS |
Integrated data curation | YesNatural and intuitive UI for data discovery, curation, acceleration, and collaboration. | NoRequires third party tool or custom scripts written by data engineers |
Integrated Data Lineage | YesFull visibility into data lineage and access patterns for governance and errr remediation. | NoRequires third party tool or custom scripts written by data engineers |
License | Apache | Apache |