June 12, 2023 Imply Announces Apache Druid Automatic Schema Discovery

The third milestone of Imply’s Project Shapeshift brings industry-leading developer ease of use and operational efficiency to Apache Druid.

Imply, the company founded by the original creators of Apache Druid®, today unveiled the third milestone in Project Shapeshift, an initiative designed to evolve Apache Druid and solve the most pressing issues developers face when building real-time analytics applications. This milestone introduces the following:

Schema auto-discovery: the ability for Druid to discover data fields and data types and continuously update tables automatically as they change
Shuffle joins: the ability to join large distributed tables without impact to query performance, powered by the new multi-stage query engine
Global expansion and new enhancements to Imply Polaris, the cloud database service for Apache Druid

Apache Druid, the analytics database when real-time matters, is a popular open source database and 2022 Datanami Reader’s Choice winner used by developers at 1000s of companies including Confluent, Salesforce, and Target. Because of its performance at scale and under load – along with its comprehensive features for analyzing streaming data – Druid is relied on for operational visibility, rapid data exploration, customer-facing analytics, and real-time decisioning.

Project Shapeshift was announced at Druid Summit 2021 and it marked a strategic initiative from Imply to transform the developer experience for Druid across three pillars: cloud-native, simple, and complete. In March 2022, Imply announced the first milestone with the introduction of Imply Polaris, a cloud database service for Druid. In September 2022, Imply announced the largest architectural expansion of Druid in its history with the addition of a multi-stage query engine.

“Druid has always been engineered for speed, scale, and streaming data. It’s why developers at Confluent, Netflix, Reddit and 1000s of other companies choose Druid over other database alternatives,” stated FJ Yang, Co-Founder and CEO of Imply. “For the past year, the community has come together to bring new levels of operational ease of use and expanded functionality. This makes Druid not only a powerful database, but one developers love to use too.”

Companies including Atlassian, Reddit, and PayTM utilize Imply for Druid because its commercial distribution, software, and services simplify operations, eliminate production risks, and lower the overall cost of running Druid. As a value-add to existing open source users, Imply guarantees a reduction in the cost of running Druid through its Total Value Guarantee.

Project Shapeshift Milestone 3 includes the following major contributions to Apache Druid and new features for Imply Polaris:

Automatic Schema Discovery in Druid

Schema definition plays an essential role in query performance as a strongly-typed data structure makes it possible to columnarize, index, and optimize compression. But defining the schema when loading data carries operational burden on engineering teams, especially with ever-changing event data flowing through Apache Kafka and Amazon Kinesis. Databases such as MongoDB utilize a schemaless data structure as it provides developer flexibility and ease of ingestion, but at a cost to query performance.

Today, Imply announces a new capability that makes Druid the first analytics database that can provide the performance of a strongly-typed data structure with the flexibility of a schemaless data structure. Schema auto-discovery, now available in Druid 26.0, is a new feature that enables Druid to automatically discover data fields and data types and update tables to match changing data without an administrator.

Auto detection of new tables: Druid can now auto-discover column names and data types during ingestion. For example, Druid will look at the ingested data and identify what dimensions need to be created and the data type for each dimension’s column.
Maintenance of existing tables: As schemas change, Druid will automatically discover the change – dimensions or data types are added, dropped, or changed in the source data – and adjust Druid tables to match the new schema without requiring the existing data to be reprocessed.

“Now with Apache Druid you can have a schemaless experience in a high-performance, real-time analytics database,” said Gian Merlino, PMC Chair for Apache Druid and CTO of Imply. “You don’t have to give up having strongly-typed data in favor of flexibility as schema auto-discovery can do it for you. Net, you get great performance whether or not you define a schema ahead of time.”

“Druid handling real-time schema changes is a big step forward for the streaming ecosystem,” stated Anand Venugopal, Director of ISV Alliances at Confluent. “We see streaming data typically ingested in real-time and often coming from a variety of sources, which can lead to more frequent changes in data structure. Imply has now made Apache Druid simple and scalable to deliver real-time insights on those streams, even as data evolves.”

Large Complex Joins Now Supported in Druid During Ingestion

In Druid 26.0, Apache Druid has expanded join capabilities and now supports large complex joins. While Druid has supported joins since version 0.18, the previous join capabilities were limited to maintain high CPU efficiency for query performance. When queries required joining large data sets, external ETL tools were utilized to pre-join the data.

Now, Druid has added support for large joins at ingestion – architecturally via shuffle joins. This simplifies data preparation, minimizes reliance on external tools, and adds to Druid’s capabilities for in-database data transformation. The new shuffle joins are powered by Druid’s multi-stage query engine – and in the future the community will extend shuffle joins to join large data sets at query-time in addition to ingestion-time.

Continued Innovation for Imply Polaris

Imply Polaris, the cloud database service for Apache Druid, is the easiest deployment model for developers. It delivers all of Druid’s speed and performance without requiring expertise, management, or configuration of Druid or the underlying infrastructure.

This cloud database was built to do more than cloudify Druid; it also optimizes data operations and delivers an end-to-end service from stream ingestion to data visualization.

Today, Imply announces a series of product updates to Polaris that enhance the developer experience, including:

Global Expansion – In addition to the US region, Polaris is now available in Europe, enabling customers to run across multiple availability zones as well as multi-regions for improved fault tolerance.
Enhanced Security – Polaris adds private networking options by ingesting data over AWS PrivateLink from customers’ Kafka or Confluent clusters in AWS. Customers who want to lower their data transfer costs can also choose VPC Peering for ingestion with Polaris.
Expanded integrations – In addition to native, connectorless support for Confluent Cloud, Polaris adds the same native support for Apache Kafka and Amazon Kinesis to easily ingest streaming data from anywhere. Polaris also now provides an API to export performance metrics to observability tools including Datadog, Prometheus, Elastic, and more.

Learn More:

Learn about the new enhancements in Druid 26.0 in this blog
Sign up for a free trial of Imply Polaris and read this blog on the latest updates
Watch this video to learn what Druid is used for
View the Apache Druid developer success ebook
Read Datanami’s recent spotlight article on Apache Druid

Subscribe to Industry Today

This field is hidden when viewing the form

Name

Name(Required)

First Last

Company Name(Required)

Email(Required)

Job Title(Required)

Other

Country(Required)

Business Type(Required)

Your Industry(Required)

CAPTCHA

Read Our Current Issue

Hire Heroes USA: Channeling Veteran Skills to Power U.S. Manufacturing

Most Recent EpisodePMI Pulse: Navigating Contraction with ISM’s Susan Spence

Listen Now

Tune in for a timely conversation with Susan Spence, MBA, the new Chair of the ISM Manufacturing Business Survey Committee. With decades of global sourcing leadership—from United Technologies to managing $25B in procurement at FedEx—Susan shares insights on the key trends shaping global supply chains and what they mean for the manufacturing outlook.

News ............. And More

July 8, 2025

Navigating Disruptions with Supply Chain Resilience

July 8, 2025

AI and Data Drive the Aftermarket of the Future

July 8, 2025

What’s the Story on Failed Condition Monitoring Pilots?

July 8, 2025

Boost Efficiency and Compliance with Paper-on-Glass

July 8, 2025

Technology News

July 2, 2025

Resilient Breach Protection for Resource-Limited Teams

July 2, 2025

The Manufacturing Cost Estimation Gap

July 2, 2025

Why UK Manufacturers Need Partners, Not Just Investors

July 2, 2025

Master Your Manufacturing Data To Supercharge AI

July 2, 2025

Manufacturing News

June 30, 2025

Publisher’s Letter

June 30, 2025

Why Hiring Veterans is a Smart Strategic Move

See All

Get In Touch

Google news and SEO compliant, Industry Today’s state-of-the-art digital media platform offers bespoke media campaigns that target key decision makers and buyers to achieve your marketing and promotional goals.

Industry Today

472 Meeting Street
Ste C-156
Charleston, SC 29403
USA
Telephone

Voice: +001 973.218.0310
Email

For further information contact
Susan Poeton:
spoeton@industrytoday.com

Contribute

Showcase your brand and promote your business to our highly targeted audience. We offer detailed Google Analytics with measurable ROI to assure success. Submit your content for review by our Editorial team who will contact you to discuss the project further.

About Us

Reach Your Targeted Audience and Grow Your Business. Learn more About Industry Today.

Contact Us

This field is hidden when viewing the form

Name

Name(Required)

Email(Required)

Phone

Comments

CAPTCHA

July 10, 2025EGGER Group Gets 99.99% Uptime with SIOS LifeKeeper

July 10, 2025YOKE Bolsters Global Stock of Eye Self-Locking Hooks

July 8, 2025Navigating Disruptions with Supply Chain Resilience

July 8, 2025AI and Data Drive the Aftermarket of the Future

July 8, 2025What’s the Story on Failed Condition Monitoring Pilots?

July 8, 2025Boost Efficiency and Compliance with Paper-on-Glass

July 8, 2025AI and Data Drive the Aftermarket of the Future

June 24, 2025Industrial Manufacturing US Deals 2025 Midyear Outlook

June 16, 2025Building Up, Not Out with a Multistory Mezzanine

June 12, 2025Southeast Crane & Hoist Installs R&M Cranes at Foundry

May 27, 2025Building an ETO BOM

May 20, 20252025 Telecom and Media Report

July 10, 2025EGGER Group Gets 99.99% Uptime with SIOS LifeKeeper

July 10, 2025YOKE Bolsters Global Stock of Eye Self-Locking Hooks

July 8, 2025Bernie’s Book Bank Partners with Newcastle Systems

July 8, 2025Evolve IP Growth Strategy Revealed

July 7, 2025Little Rock CCVB Streamlines Procurement with Vroozi

July 7, 2025DASH and S4i Complete Merger, Form SMRTR

June 12, 2023 Imply Announces Apache Druid Automatic Schema Discovery

Automatic Schema Discovery in Druid

Large Complex Joins Now Supported in Druid During Ingestion

Continued Innovation for Imply Polaris

Learn More:

Subscribe to Industry Today

Most Recent EpisodePMI Pulse: Navigating Contraction with ISM’s Susan Spence

News ............. And More

Subscribe to Industry Today’s regular e-newslettersindustrytoday.com

July 10, 2025EGGER Group Gets 99.99% Uptime with SIOS LifeKeeper

July 10, 2025YOKE Bolsters Global Stock of Eye Self-Locking Hooks

July 8, 2025Navigating Disruptions with Supply Chain Resilience

July 8, 2025AI and Data Drive the Aftermarket of the Future

July 8, 2025What’s the Story on Failed Condition Monitoring Pilots?

July 8, 2025Boost Efficiency and Compliance with Paper-on-Glass

July 8, 2025AI and Data Drive the Aftermarket of the Future

June 24, 2025Industrial Manufacturing US Deals 2025 Midyear Outlook

June 16, 2025Building Up, Not Out with a Multistory Mezzanine

June 12, 2025Southeast Crane & Hoist Installs R&M Cranes at Foundry

May 27, 2025Building an ETO BOM

May 20, 20252025 Telecom and Media Report

July 10, 2025EGGER Group Gets 99.99% Uptime with SIOS LifeKeeper

July 10, 2025YOKE Bolsters Global Stock of Eye Self-Locking Hooks

July 8, 2025Bernie’s Book Bank Partners with Newcastle Systems

July 8, 2025Evolve IP Growth Strategy Revealed

July 7, 2025Little Rock CCVB Streamlines Procurement with Vroozi

July 7, 2025DASH and S4i Complete Merger, Form SMRTR

June 12, 2023 Imply Announces Apache Druid Automatic Schema Discovery

Automatic Schema Discovery in Druid

Large Complex Joins Now Supported in Druid During Ingestion

Continued Innovation for Imply Polaris

Learn More:

Subscribe to Industry Today

Subscribe to Industry Today’s regular e-newslettersindustrytoday.com

Most Recent EpisodePMI Pulse: Navigating Contraction with ISM’s Susan Spence

News ............. And More