[DaaS] dremio

Data as a Service Platform

https://www.dremio.com/
Opensource (Community edition / Enterprise edition)
License : Apache 2.0
Main Language : Java
참고문서
- DremioArchitectureGuide.pdf
- dremio.pptx
Document Site
- https://docs.dremio.com/
사용 Opensource
- Apache Arrow (in-memory analytics)
- Apache Drill (SQL execution engine )
- Apache Parquet (optimization for analytics Data Type)
- Apache Calcite ( SQL Parser)
- React ( front-end Javascript framework)
- Apache Zookeeper ( Distributed Key-value Store)
- RocksDB (embedded Key-value store)
- Ansible ( CI/CD automation framework )
- git ( Source Control - github)
- Cerrit ( code Collaboration tool - Core-review )

기타 정보

Headquarters Regions
San Francisco Bay Area, Silicon Valley, West Coast
Founded Date
Jun 9, 2015
Founders
Jacques Nadeau, Tomer Shiran
Funding Status
Early Stage Venture
Number of Employees
11-50

Dremio provides a quantum leap in performance, based on four areas of innovation.

Apache Arrow Execution
From 1 to 1000+ nodes, architected for cloud deployments: elastic compute, runs on object stores.
Data Reflections™
Accelerate data and queries automatically, up to 1000x faster, with the full power of relational algebra.
Native Push-Downs
Optimized query semantics for each data source – Amazon S3, ADLS, RDBMS, NoSQL, HDFS, and more.
Vertically Integrated Query Engine
Cost-based query planner automatically generates query plans to make optimal use of Data Reflections™ and push downs.

◎ 주요 Feature

Data Acceleration. Using columnar, compressed Apache Arrow for efficient in-memory analytical processing, and Apache Parquet for persistence of source data that is optimized for one or more query workloads through partitioning, sorting, aggregations, projections, and distributions.
Data Catalog searchable index of your data source metadata, as well as virtual datasets created by Dremio users.
Integrated Data Curation. Through a powerful and intuitive GUI, easy for business users, yet sufficiently powerful for your data engineers, and fully integrated into Dremio.
Push-Downs On Any Data Source. Including optimized push downs and parallel connectivity to relational databases, non-relational systems like MongoDB, Elasticsearch, as well as S3 and HDFS.
Cross-Data Source Joins execute high-performance joins across multiple disparate systems and technologies, between relational and NoSQL, S3, HDFS, and more.
Data Lineage. Full visibility into your data lineage, from your data sources, through transformations, joining with other data sources, and sharing with other users.

Terminology & Concepts

Data Reflections™: Physically optimized representations of source data that both offload operational systems, and optimize one or more analytical workloads. Reflections are transparent to end users, and automatically substituted by Dremio’s query planner. Reflections have a configurable TTL SLA, so you can trade off freshness and query latency.
Data Catalog: An index of source metadata, including the names of tables, views, columns, fields, collections, indexes and more. Users can easily issue Google-sytle searches to find datasets for a given job. Data Catalog includes all metadata from virtual datasets as well.
Data Curation: A visual and intuitive way for analysts, data scientists, and data engineers to transform data for the needs of a particular job, without making copies of the data.
Data Lineage: As data is used for multiple jobs, it is transformed, joined, and shared with other users, forming an implicit graph of relationships and dependencies. These relationships help to understand data use, and relationships that are essential for security, governance, and remediation.
Recommendations: As users interact with datasets, their behavior can serve as the basis for recommendations to other users, helping to build joins and transformations more easily.
Apache Arrow-Based Execution: Apache Arrow is a columnar standard for in-memory analytics. It provides significant advantages in terms of memory and CPU efficiency, and is designed to work well with GPUs and FPGAs.; .

Feature Comparison

	Dremio	SQL Execution Engines
	Dremio	SQL Execution Engines
Scale-out architecture	Yes	Yes
Accelerates aggregation queries	YesQueries are written against the logical schema, and Dremio's query planner automatically rewrites the query to use Aggregation Reflections, invisible to the end user.	NoRequires a slow full table scan each time.
Accelerates ad-hoc queries	YesQueries are written against the logical schema, and Dremio's query planner automatically rewrites the query to use Raw Reflections, invisible to the end user.	NoRequires a slow full table scan each time.
Accelerates relational data sources	YesDremio Reflections, and native optimizers with first class push downs of queries	NoVaries by engine, but most require third party ETL to move and prep data for HDFS or S3
Accelerates NoSQL data sources	YesDremio Reflections, and native optimizers with first class push downs of queries	NoVaries by engine, but most require third party ETL to move and prep data for HDFS
Integrated data curation	YesNatural and intuitive UI for data discovery, curation, acceleration, and collaboration.	NoRequires third party tool or custom scripts written by data engineers
Integrated Data Lineage	YesFull visibility into data lineage and access patterns for governance and errr remediation.	NoRequires third party tool or custom scripts written by data engineers
License	Apache	Apache

Difference between Dremio vs Presto

참조 URL

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

sncap Style

[DaaS] dremio

Dremio provides a quantum leap in performance, based on four areas of innovation.

Apache Arrow Execution

Data Reflections™

Native Push-Downs

Vertically Integrated Query Engine

Terminology & Concepts

Feature Comparison

Difference between Dremio vs Presto

'OpenSource'의 다른글

티스토리툴바

[DaaS] dremio

Dremio provides a quantum leap in performance, based on four areas of innovation.

Apache Arrow Execution

Data Reflections™

Native Push-Downs

Vertically Integrated Query Engine

Terminology & Concepts

Feature Comparison

Difference between Dremio vs Presto

'OpenSource'의 다른글

관련글

티스토리툴바