AN INTRODUCTION TO SINGLESTORE
according to ANTOINE STELMA
Erik Fransen & Antoine Stelma
Connected Group initiators are experts in the field of Business Intelligence, DataWarehousing, Big Data and Data Virtualization.
Connected Data Group offers Data & Analytics services for the digital transformation and is specialized in Agile Data Architecture and Data Management.
They blog about current topics and developments in the field. The intention of the blog is to share knowledge and engage with peers.
AN INTRODUCTION TO SINGLESTORE
With an extensive background of more than 20 years in Data & Analytics, seeing all new technology and methods, I thought that the concept of databases has seen its last days in the data and analytics landscape. With all kinds of promising technologies, like Hadoop and Data Lakes, you think that the days of storing data into tables, handling performance issues and getting data into predefined tables are over.
But as a data architect, how can we oversee and govern all of this data? Do we really think that storing data without any structure or governance will lead us anywhere? The amount of data is growing exponentially, while data is getting more versatile. How can we manage and govern all this data?
More organizations are growing into hybrid cloud solutions, searching for a balance between on-premises and cloud.
So, what if there is a highly performant unified database that can be used on-premises, or on any of the leading public clouds, or in a hybrid mode to migrate existing databases? What if there is a database that can handle both OLTP as well as OLAP workloads equally with ease? What if we had a database that offers streaming ingestion together with super-low latency query performance, even with high concurrency requirements? What if it can be used to speed up our current and valued Data warehouse implementations, and offers an intelligent way to connect to data lakes with onboarding onboard functionality for text search, geo spatial, time series and supports row and column storage. And, while I’m adding features to my wish list: how about horizontal and vertical scaling, and the use of ANSI SQL language as a standard for analytics together with lots of programming functionalities. And finally: as data arises, getting integrated or federated and used in a business context for making crucial decisions: lineage, metadata and all relevant features to ensure that every user has insight in the lineage of the data.
And then we were introduced to SingleStore and became a VAR-partner for all these amazing capabilities and more.
In a serie of 5 short blogs, I would like to highlight for you what SingleStore is all about and what it can do for your organization. This won’t be a technical blog, as I want to tell you about the functionality and how it can be used in everyday scenarios.
SingleStore presents itself as “the ideal unified database for fast analytics and AI/ML-powered applications that requires fast data ingest, high performance queries and elastic scaling with familiar relational SQL.”
Let’s have a closer look into what this means: SingleStore has its origins as a SQL-oriented in-memory database. It has evolved itself into a unified database that offers Universal Storage and creates two types of database tables: row store (better for transactional processing) and column store (better for analytical processing). You can use them as needed and integrate them for every goal.
Each type of database has its specific advantages. All databases are running in the SingleStore instance. By default, the SingleStore database is a cluster of servers that form a high availability cluster. A cluster can be expanded with multiple nodes, called leafs. This makes scaling of an environment possible, on-premises or in the cloud. All can be managed via the dashboards that are part of the software. And the moment that you decided to go for the SAAS version, it’s a one-to-one migration. You will have the same functionality on-premises as in the SAAS version.
When we log on to SingleStore, we directly notice the similarity with MYSQL, which makes using the system familiar and easy to adopt.
The core features of SingleStore will be discussed in the next blog in detail. But let’s have a first look to discover the possibilities on a high level:
Hybrid Transaction Analytical Processing architecture. Like Gartner says: it breaks the wall between transaction processing (OLTP) and analytics (OLAP). An architecture that actually makes the copying multiple datasets for analytical purposes obsolete. HTAP will not eliminate your Data Warehouse but if the source application runs in this architecture, there will be less need to copy data to other storages as you now have a method to directly use the existing and original data source. This is a direct benefit for your Data Management.
PROCEDURES, TRIGGERS & FUNCTIONS
Every mature database system needs a broad range of procedures, triggers and functions to run applications or support data transformation. As SingleStore supports dynamic SQL, automation of code (for Data warehouses or migration) can push your consistency and quality to the next level.
With the functionality of Time Series and the column store tables, you are able to process real time data and use time series analysis.
A feature that natively ingests real-time and batch data from external sources. Pipelines are highly performant, scalable and support fully distributed workload. They are based on the Exactly-once-Semantics, meaning transactions are processed only once. This helps to ensure a good data policy. Pipelines natively support the JSON, Avro, Parquet, and CSV data formats and can be used for Apache Kafka, Amazon A3, Azure Blob, Google Cloud Storage and HDFS.
Beautiful feature: a pipeline can be attached to a procedure, so incoming data, streaming or batch can be analysed up front and enriched with the relevant metadata, like classification of the data or technical data, like counts, timestamps, etc. The result can be stored in the database table which saves all types of format transformations.
In order to manage data, we need generated metadata to oversee the data loads and apply rules for quality purposes. As more organizations have a strong demand for lineage over the whole data life cycle, a rich information schema is available to check and control your data. This metadata can be exposed to data catalog or reporting tools.
EXTRACT TRANSFORM LOAD / EXTRACT LOAD TRANSORM (ETL/ELT)
With the Pipelines, part of the ETL/ELT can be executed with clear insight of the processing of transactions. As SingleStore is MYSQL compatible, existing ETL tools can directly use SingleStore as a high-performance database
In my next blog, I will dive deeper into the topics with real examples that have been mentioned in this blog. If you want to try it out yourself, please visit the Single Store Academy where you can follow all the relevant training for free. Find information and test a fully working version of as a SaaS. (30 days for free).
WANT TO KNOW MORE?
If you want to know more about SingleStore and what we can do for your organization, please contact me: email@example.com