SEDA & Subsquid Dissect Web3 Data Differences

8 min readMar 29, 2024

Learn the differences between common types of data projects with Subsquid & SEDA.

Data is one of the fundamental building blocks of any blockchain, so much so that we’d not consider something a blockchain if it doesn’t store its transaction history. A blockchain itself is a distributed ledger recording transactional data within each block. However, as blockchain capabilities have evolved beyond basic decentralized record-keeping networks to dApp ecosystems, so has the role of data. This has led to the creation of various projects dedicated to data on the blockchain. They include on-chain data management, cross-chain data retrieval, off-chain data access, and data availability.

With all that variety, it’s easy to get confused about a given data project’s purpose due to its seeming overlap with what others cater to. In this blog, we will offer some clarity by breaking down the world of blockchain data and highlighting the specific roles of different data projects, including Subsquid and SEDA.

SEDAs Modular Data Layer & Subsquids Web3 Data Lake Leading Blockchain Data.

This History Of Data On The Blockchain

Blockchain Data Genesis | The Age of Bitcoin

The Bitcoin Network was the first decentralized blockchain. Back then, data requirements were relatively simple. The Bitcoin network itself stores settled transactional data within each one of its blocks. The network only has and requires access to the transactional data within its execution environment (The Bitcoin Chain). This is the most straightforward way of indexing data (storing raw data for easy retrieval) in the context of blockchain. Note, however, that not all nodes in Bitcoin store an entire history, only so-called archive nodes with the full ledger, which other nodes can tap into whenever they need a more extensive track record.

Blockchain Data Evolution V1 | The Ethereum Era

Fast-forward to the deployment of the Ethereum Network, which allowed developers to launch any type of smart contract to execute specific functions within the network. As applications managed by smart contracts were deployed across Ethereum, the need for different data requirements arose. The first Oracles were launched to fulfill the demand for off-chain data, such as price feeds, allowing smart contracts to execute based on real-world input.

Blockchain Data Evolution V2 | The Rise Of L1s & L2s

As Ethereum saw early success as a leading L1 it was met with fierce competition from newer, more modern L1s looking to reign supreme. Others saw their chance to establish alternative blockchain ecosystems. At the same time, Ethereum has encouraged the creation of Layer 2s to enhance capacity. As more Layer 2s competed for Ethereum block space, the first data availability (DA) layers were conceptualized, offering an alternative. DA layers are “optimized for solely ordering and guaranteeing the availability of transaction data. — Mustafa Al-Bassam, Co.Founder Of Celestia

Blockchain Data Evolution V3 | The Fall Of Fragmentation

As Web3 moved into a multi-chain, multi-ecosystem environment from 2018 to 2023, it became strikingly apparent that the networks were fragmenting, resulting in siloed tech stacks. Bridges and oracles desperately tried to integrate with all networks, but delays grew to 6–9 months due to complete infrastructure saturation. With more and more chains, systems for raw on-chain data held by indexers were beginning to fail, and more efficient methods were needed to store terabytes of raw data. It was aberrant that an upgrade was required. Here, industry-defining data projects, including Subsquid and SEDA, were founded to create a more efficient, scalable, and accessible data world for blockchains.

Blockchain Data Evolution V4 | The Great Upgrade

With many networks, ecosystems, and blockchains, data infrastructure needs upgrading. SEDA, a data layer, is the natural evolution of Oracles, which has set a new standard for Web3 data transport, access, and configuration. Natively supporting 230+ Blockchains, SEDA allows for the direct transport and configuration of any off-chain and on-chain data type from any source.

However, another important criterion for dApp developers is accessing on-chain indexed data. This is where solutions like Subsquid come in handy. Using Subsquid, developers do not have to go through the painful process of requesting blockchain data from RPC nodes; they can simply use Subsquid’s decentralized data lake. Unlike other indexing solutions, Subsquid takes an unopinionated approach, making data available as it is. This allows developers maximum flexibility in defining their own schemas from simple to highly complex. As a data lake, it’s horizontally scalable and chain-agnostic, meaning any chain’s data can be stored and easily retrieved using Subsquid. Currently, Subsquid already supports more than 110 chains, securing billions of dollars traded in dApps relying on it for on-chain data.

Demystifying Confusion Between Different Data Infrastructure Types

Oracles

Oracles marked an incredible step forward for blockchain technology by giving smart contracts access to off-chain data. At its core, an oracle is a direct bridge between a smart contract and a data source, such as the price of BTC/USD. A separate oracle bridge must be created for every data source to funnel that specific data feed to a smart contract. Historically, oracles have also been application-specific builds, such as a specific price feed or a bridge. Being built to solve a single purpose, such as the on-chain delivery of BTC/USDT price updates, is a fundamental reason oracles are experiencing a backlog in integration today.

As the number of blockchains rapidly increased, oracles struggled to keep up with deployment requests, resulting in extreme backlogs. Additionally, the need for data feed configuration and permissionless data optionality has become more apparent with developers building hyperspecialised apps. oracles traditionally are permissioned tools that require oracle companies to build specific feeds for specific chains to meet developers’ configuration requirements, further delaying development times and hindering dApps progress.

Data Transport Layers

A Data Transport Layer applies the concept of an Oracle to a complete layer 1 blockchain. Where an Oracle is spun up to be a direct bridge from data source to smart contract destination, a data layer acts as an intermediary between any data source and smart contract requesting data. Ultimately, this enables unprecedented data accessibility from any chain via a seamless integration, mitigating delays for builders.

In contrast to single-purpose application-specific oracles, a data layer creates an entry point for off-chain data, making it available via its network for builders to develop feeds. This allows the developer to create their own data feed on the data layer that consumes the data made available by providers on the network.

SEDA, the first modular data layer of its kind, removes the need for Oracle native deployment by connecting any chain to any data type on a 100% permissionless basis. Where an Oracle must spin up a new API connection point to a new data source, data providers on SEDA simply use one API connection for any data they wish to add to the network, meaning zero delays.

Specific to SEDA are ‘programs’ that allow developers to set the configuration parameters for data they request, creating their own data feeds for any data available on the SEDA network. As a result, data layers such as SEDA allow for a permissionless plug-and-play for developers to have same-day data access to any data type on the network on integration while being able to create their own programmed feeds for that data. Data layers are incredibly efficient, scalable, and customizable networks compared to outdated Oracle tech.

Data Lakes

While sometimes confused with Data Availability Layers, Data Lakes are a broader category that allows storing data in raw format at scale. In Web2, data lakes are widely adopted by any business, generating a lot of data as a centralized repository for all the structured and unstructured data.

Businesses can import any amount of data in real-time using a data lake and then run analytics on it to generate further insights. In the context of Web3, some projects have chosen to store their data in Web2 data lakes, breaking the decentralized ethos. (For example, Solana puts much of its historical data in Google Big Query for ease of access.)

Subsquid aims to provide an alternative with a decentralized data lake that makes data scaleable. What differentiates Subsquid from a data availability solution is that the main goal is not to be a place for rollups to post transaction data and allow independent verification. Instead, we aim to empower builders with easy access to all the on-chain and, eventually, off-chain data they need.

You can imagine the Subsquid data lake as a library that contains all the data from EVM, Substrate, and Solana protocols. One spot for all the raw data that devs can tap into is by programming their own librarian (indexer).

With Subsquid, one can access on-chain data without going to the archive nodes — we separate the compute from the data. Subsquid’s main goal is to allow reading and querying terabytes of data in seconds while offering guarantees on correctness.

Data Availability Layers

Data availability (DA)is a crucial component that contributes to maintaining blockchain’s trustless properties. While DA has only recently become a hot topic, it has been part of any blockchain, regardless of whether monolithic or modular. In short, Data availability layers guarantee that all the nodes in a blockchain network have access to transaction data — allowing them to verify its correctness independently.

In monolithic blockchains, this is solved by requiring all full nodes to store a copy of the entire history at the expense of scalability. With the emergence of the modular paradigm, builders have started decoupling blockchains’ functions and created layers designed only to make blockchain data available to any interested parties.

As such, data availability layers take on the function of archival nodes in monolithic blockchains, offering a way to verify that transactions adhere to the consensus protocol.

A Practical Project Walkthrough Using SEDA And Subsquid

To put the above into a practical context, let’s examine why each type of data project is used. To verify transactions are true on the blockchain, rollup nodes publish their block history to DA layers, which can easily be accessed and verified. If developers need access to raw data on any blockchain (Not just transaction verifications), they can integrate with Subsquids data lake, accessing 100s of chains’ raw block data.

The more data Subsquid’s data lake holds, the more valuable it becomes to developers, including smart contract developers. Using SEDA, Subsquid can make its data lake available on the SEDA network, which would aggregate the data, allowing builders to spin up a feed directly according to their protocol.

Thanks to Modular Data Availability layers, more blockchains can be deployed. As more blockchains are deployed, more raw data must be indexed via Subsquid. SEDA, as a data layer, gives chain-agnostic access to smart contracts that may want to compute, configure, or access that data for any reason.

SEDA & Subsquid Dissect Web3 Data Differences

This History Of Data On The Blockchain

Demystifying Confusion Between Different Data Infrastructure Types

A Practical Project Walkthrough Using SEDA And Subsquid

Written by SEDA

No responses yet