In today’s world, the volume of data doubles every two years. However, the data itself won't have much value if you have to wait too long for processing results. For example, in the US, over 43% of customers switched banks due to slow digital services. In e-commerce, every second of delay on a website reduces conversion by 7%. According to Amazon’s research, every 100 ms of delay cost them 1% of sales.

Often, the bottleneck in IT infrastructure that causes users to wait and businesses to lose customers is the inefficient performance of the database. As a result, companies invest billions of dollars each year in maintaining and optimizing them. Over the past five years, the database management systems market has more than doubled, growing from $46 billion in 2018 to $103.2 billion in 2023.

With over 10 years of experience in the field of manufacturing analytics, where the speed of data processing from real-time IIoT sensors directly impacts production efficiency, I’ve seen firsthand how critical this is. In this article, I’ll share my observations on the most common issues that slow down query processing and how to optimize database performance to provide instant results to customers.

Challenges in working with large databases

A large industrial facility with a continuous production cycle may have over a hundred IIoT sensors installed, generating millions of data points every day. The need to process such a volume of data in real time can affect system performance, slowing down even simple queries. The main issues that can lead to this include:

When the tool becomes part of the problem

The issues listed above can be resolved by optimizing queries, improving the architecture, or scaling the system. However, if you've chosen the wrong type of database, these efforts will be in vain.

I'll explain using the example of the Waites product. We work with data from industrial sensors that record vibration, temperature, and a dozen other parameters every second as part of equipment operation. These are classic time series data, continuously flowing and time-stamped. Initially, we used a file-based storage approach organized in a folder structure. However, as the data volume grew, this became inefficient: each query required scanning large amounts of files and took up to 10 seconds. For a client monitoring how a piece of equipment is performing, that’s too long.

To speed up processing, we initially switched to InfluxDB and later moved to TimescaleDB, a PostgreSQL extension optimized for time series data. It scales well and allows for data compression and archiving. The result was an 80% increase in performance and a reduction in query time to 2 seconds.

Choose a database based on the type of data you're working with and the tasks the system needs to accomplish:

In real architectures, it’s not necessary to rely on a single database. On the contrary, combining several DBMSs for different types of data allows performance to be maintained.

How to maintain stable data processing speed

For a database to work efficiently even with billions of records, in addition to choosing the right DBMS type, you need to follow several principles:

  1. Optimize queries. Tools like EXPLAIN (for SQL databases) help visualize how a query is executed and identify bottlenecks. For example, adding an index to frequently queried columns can speed up data retrieval by orders of magnitude.
  2. Balance indexing. While indexes speed up searches, an excess of them can slow down data writing. It's crucial to strike a balance between read speed and data update efficiency.
  3. Plan for scalability. Data volumes often grow faster than available resources. For example, in industrial settings, during peak periods such as holidays when all equipment is operating at full capacity, data volumes can increase exponentially. Cloud solutions like Amazon Aurora Serverless enable automatic scaling to handle the load.
  4. Monitor performance. Use tools like Datadog or New Relic to detect slow queries, CPU overload, or memory shortages before users experience issues.

When working with billions of records, the key to maintaining stable performance lies in continuous architecture improvement. A well-chosen database, query optimization, and readiness for scaling form the foundation of an efficient system. This ensures that your data works for the client, rather than just accumulating in storage.