Airbnb’s Mussel V2 represents a major step forward in the company’s data infrastructure. Designed as a unified key-value storage system, it seamlessly integrates real-time streaming ingestion with bulk batch processing, addressing the growing challenge of managing heterogeneous data pipelines at scale.
At Airbnb’s scale, data arrives from multiple systems — transactional services, analytics pipelines, and machine learning models. The first version of Mussel served as a distributed KV store optimized for read-heavy workloads. However, as event streaming (via Kafka) and bulk imports grew in importance, Airbnb engineers faced a mismatch between streaming and batch ingestion capabilities.
Mussel V2 was built to unify these two worlds — providing consistent performance and storage semantics regardless of ingestion mode.
Mussel V2 supports both streaming and bulk ingestion through a shared commit protocol. The streaming path uses Kafka topics for continuous updates, while the batch path leverages distributed compaction for large-scale imports. This eliminates the need for separate systems for hot and cold data.
Rather than maintaining two versions of the same dataset for different ingestion modes, Mussel V2 employs a unified record schema. Each key maintains a version history and timestamped metadata, ensuring consistency even during concurrent writes from different sources.
The system integrates adaptive compaction — inspired by LSM-tree techniques — to reduce write amplification. Hot partitions receive more frequent compaction, while cold data segments are compacted lazily, reducing system overhead.
These optimizations enable Airbnb to support both analytical workloads (feature stores, recommendations) and operational services (inventory, pricing) from a unified infrastructure layer.
One of the biggest hurdles was maintaining ordering guarantees between streaming and batch updates. The team developed a hybrid timestamp reconciliation algorithm that merges event timelines deterministically across ingestion modes.
Another challenge was ensuring backward compatibility for legacy services that still rely on Mussel V1 APIs. A compatibility shim allows gradual migration without service downtime.
Mussel V2 now underpins several mission-critical systems, including feature stores for personalization and near-real-time dashboards. It also integrates with Airbnb’s internal orchestration frameworks and metadata services, enabling easier data lineage tracking and observability.
Future plans for Mussel V2 include:
With Mussel V2, Airbnb has redefined its approach to large-scale data management. By merging streaming and bulk ingestion into one coherent system, the company gains performance, operational simplicity, and a clear path toward fully unified data infrastructure.