blogs | zinzu | Page 2 of 2

Data: Wrangling Challenges, Opportunities, and a Herd of Cats

Businesses today use various data processes—like ETL pipelines, observability tools, and marketing platforms—to ensure smooth operations and gain insights. But are these processes truly effective?

Dashboards are commonly used to monitor performance, but they often provide only partial information when detailed insights are needed. Most processes confirm that everything is functioning as expected, but when systems break or new opportunities arise, it’s usually for NEW reasons – leaving us with stale insights and unanswered questions. This results in time-consuming requests to engineering teams, slowing decision-making and disrupting planned deliverables.

What Do We Need?

We need access to raw, well-organized data that's easily queryable. This enables us to explore and uncover the root causes of new issues and opportunities.

With data lakes and elastic cloud computing, accessing raw data isn’t a problem. The real challenge isn’t size or compute cost—data can be filtered, and costs managed with spot instances and minimal storage fees. However, several challenges remain:

Disparate Systems: Data spread across different systems complicates access and analysis.
Inconsistent Schemas and Serializations: Varying schemas and serialization formats add complexity.
Complex Queries: Analyzing raw data often requires time-consuming and intricate queries.
Technical Expertise: Effective querying demands specialized skills.

The most challenging aspect of opening up raw data is writing complex queries on disparate datasets with varying schemas. If the querying process can be simplified through no-code tools or natural language interfaces, the vast potential of raw data can be unlocked for a broader range of users.

Democratizing access to raw data is key to unlocking its full potential.

Is Latency an Issue?

Latency isn’t usually a concern in exploratory analysis. The real issue is the time spent searching for data in index-based systems through complex queries, often yielding incomplete results. These systems are also costly if we store all datasets and inefficient for occasional use.

Is Vendor Lock-In the Right Solution?

Businesses often rely on vendors for observability or marketing funnels, but this can limit flexibility. For example, extracting detailed insights, like sequenced log records for long sessions, can be challenging due to vendor data aggregation. To maintain control, consider storing raw data in the cloud with on-demand querying for deeper analysis.

The Constraints of SQL-Style Joins and Tabular Visualization

RDBMS systems and SQL enforce a relational, fixed-schema approach that limits flexibility. This approach is particularly unsuitable when trying to visualize data in the sequence of its generation, especially with varying schemas. As data structures evolve and become more complex, traditional tabular formats fail to effectively represent such sequences, highlighting the need for more dynamic and flexible visualization methods.

SQL's widespread influence has ingrained a row-and-column mindset in how we query data, often overshadowing more natural, linear approaches that could better represent the true sequence of events.

What’s the Way Forward?

Anticipating all possible issues in advance and preprocessing data isn’t feasible. In addition to pre-processing, it's more effective to offer an easy-to-use query mechanism that allows users to access pre-aggregated data. This way, users can leverage elastic compute to extract insights as needed.

Have you faced similar challenges? How have you overcome them?

Author: Arif Khan

Founder | Zinzu.io

Contact: team@zinzu.io

LinkedIn: Arif Khan

Passionate about solving complex problems in data using innovative solutions.