How Nasdaq is using the power of AWS Cloud ?

Milind Rastogi
6 min readSep 30, 2020

--

The Nasdaq Stock Market, also known as Nasdaq or NASDAQ, is an American stock exchange located at One Liberty Plaza in New York City. It is ranked second on the list of stock exchanges by market capitalization of shares traded, behind only the New York Stock Exchange. The exchange platform is owned by Nasdaq, Inc., which also owns the Nasdaq Nordic stock market network and several U.S. stock and options exchanges.

While most people know Nasdaq for its equities exchange, Nasdaq is also a technology and services provider to more than 120 different exchanges, regulators, and post-trade entities in more than 50 countries worldwide. Every evening, Nasdaq receives up to 60 billion records that must be loaded for billing, reporting, and other processes before the markets open the next morning. As that record volume increased over time, Nasdaq opted to migrate to a data lake solution in order to scale the ingest process and query data in a more parallel manner. Nasdaq moved its entire ingest process to Amazon Web Services (AWS), writing the data to Amazon Simple Storage Service (Amazon S3).

From there, Nasdaq chose Amazon Redshift Spectrum to enable its queries to run faster than in its existing data warehouse.

According to Robert Hunt, vice president of software engineering for global surrounding systems at Nasdaq, “We have one billing process that went from taking about 40 minutes to now running at four minutes — a 90 percent improvement in that one process. But across all of the processes, we’re seeing 60 to 70 percent improvement across the board.”

Why Nasdaq is using AWS Cloud ?

Nasdaq is using AWS services for making it’s data warehouse typically known as Nasdaq Data warehouse(NDW) more efficient and more optimized.

What is NDW ?

Nasdaq stores the most recent year of data from internal and external data sources in their data warehouse. They receive around 30–55 billion records per day from 1000 data servers. This warehouse also supports business-critical functions like billing , reporting , market surveillance and data visualization tools . Currently NDW uses 70-node ds2.8xlarge Amazon Redshift cluster which has a total capacity of 1.12 petabytes. Using the NDW Nasdaq also has to generates reports and billing to their customers at the closing time of Stock Market.

Nasdaq made a application named as NDW ingest , the main aim of developing this application is for collecting data from different data sources and then storing this data in Amazon’s S3 bucket. The data is then transferred to the AWS service Amazon Redshift(Data warehousing service).

Nasdaq created another application named as RMS a billing system. Nasdaq uses RMS for querying petabytes of data from the Amazon Redshift.

Improved version of the NDW

With the current version of the NDW they can store only one year of data but now they want to store data older than one year and they also want to expand storage outside of Amazon Redshift which will be more cost effective.

Nasdaq comes up with the solution of taking the advantage of cloud scalability and flexibility so they create “a data warehouse in the sky”.

They can now query, store and generate billings and report completely on the top of cloud. This solution reduces the cost and make querying and storing the data more faster.

Problem with the NDW and RMS System

Current version of the NDW uses Amazon Redshift for both the storage and computation. And due to the increase in the volatility of the market ,data records are increasing. This increase in data demands more and more Redshift Cluster.

But the increase in Redshift cluster leads to more wastage of their computation power as the main reason for launching more nodes is to store more and more data. And due to the storing the infinite amount of data they will one day reaches the limit of launching Redshift Cluster i.e. 128. Sooner or Later this limit will not satisfy the requirements of NDW which will eventually leads to the end of this project.

The Solution

So the Nasdaq team comes up with the solution which states that they have to separate the storage part from the Redshift clusters. They have to use some AWS service for only storing and Redshift service for querying and managing data. And the Nasdaq team also wants to make their whole data warehouse on the top of cloud which will eventually gives more scalability to their system.

The query architecture is constantly facing contention problem with the Redshift. They face contention with their loading process and the querying processes, contention between the billing and reporting process . There are always contention and they want to eliminate it quickly.

Now for solving the query architecture problem they have to eliminate the Amazon redshift and have to select another query service of AWS. The Nasdaq team have an option of using Athena , Spark ,EMR and the Redshift Spectrum.

They find that Redshift spectrum was pretty much solving their problem and for the first time they were able to divide the storage and the computation process. So they comes with the conclusion that Amazon redshift was promising but wasn’t quite meeting their needs. But still Nasdaq team was nervous whether they are in the right direction or not. So they thought of taking help from Amazon Data Lab

AWS Data Lab

AWS Data lab is a program of Amazon which helps in solving the cloud architecture problem to their customers. Their customers come with an idea of what they want and then they leave data lab with the well architecture prototype or solution.

So Nasdaq comes goes to the Amazon data lab with their requirements and problems what they want to solve from AWS expert team. Nasdaq team and the AWS experts work jointly to find a suitable solutions which can satisfy Nasdaq requirements.

Data Lab gives two solution to the Nasdaq team which is :

  1. Leverage redshift spectrum to seperate storage and compute
  2. Leverage S3 select and redshift spectrum to filter rows

The Result

Finally Nasdaq achieves what they really wanted and for the first time instead of increasing the cluster nodes they were able to reduce them. This also results in the decrease of their cost. And they were also successful in deploying their whole NDW and RMS system on the top of cloud which also results in faster querying and saves a lot of time.

Thank you for reading my article if you liked my article then please clap , share and comment.

--

--