Big Data
The 3 Vs: what makes data "big"
The defining feature of Big Data is not a fixed file size but the moment data outgrows the capacity of a single conventional server and relational database. The concept is commonly framed around three core dimensions, originally introduced by analyst Doug Laney.
- Volume — the sheer quantity of data, often measured in terabytes or petabytes, generated by transactions, sensors, logs, applications and user interactions.
- Velocity — the speed at which data arrives and must be processed, ranging from periodic batches to continuous real-time streams.
- Variety — the mix of formats: structured data (database tables), semi-structured data (JSON, XML, logs) and unstructured data (text, images, video, audio).
Two further Vs are frequently added in practice: veracity (the reliability and quality of the data) and value (the actual business benefit that can be extracted). Data only qualifies as a Big Data problem when at least one of these dimensions exceeds what classic tools can handle efficiently.
Traditional approach vs Big Data approach
The fundamental difference lies in scaling strategy. Traditional systems scale vertically (a bigger, more powerful machine), while Big Data systems scale horizontally by distributing storage and computation across many machines.
| Criterion | Traditional data processing | Big Data processing |
|---|---|---|
| Data volume | Gigabytes, fits on one server | Terabytes to petabytes, distributed |
| Scaling model | Vertical (upgrade the machine) | Horizontal (add more machines) |
| Data structure | Mostly structured, fixed schema | Structured, semi-structured and unstructured |
| Typical storage | Relational database (SQL) | Distributed file systems, NoSQL, data lakes |
| Processing | Single server, sequential queries | Distributed/parallel (e.g. MapReduce, Spark) |
Common technologies in this space include the Hadoop ecosystem and its HDFS distributed file system, Apache Spark for in-memory processing, NoSQL databases (such as document, column-family or key-value stores), and data lakes that retain raw data in its native format until it is needed.
Business use cases
For an SME or mid-market company, Big Data becomes relevant once data sources multiply and a single database can no longer answer the questions the business asks of it. Concrete applications include:
- Predictive maintenance — analysing sensor and machine telemetry to anticipate equipment failures before they happen.
- Customer behaviour analysis — aggregating clickstreams, transactions and support interactions to segment audiences and personalise offers.
- Fraud and anomaly detection — processing high-velocity transaction streams to flag suspicious patterns in near real time.
- Supply chain and logistics optimisation — combining inventory, demand and external data to improve forecasting and routing.
- Recommendation engines — using large interaction histories to suggest products or content.
Big Data is also the foundation for machine learning and AI projects, which depend on large, varied training datasets. In practice, the technical challenge is rarely storing the data — it is engineering pipelines that turn raw, high-volume data into reliable, queryable information that supports decisions.
Questions fréquentes
Building a custom software project? We design bespoke software aligned with your roadmap.
See our custom software expertiseDéfinitions liées