An aspiring Big Data scientist should look no further than Chandra Sekhar Saripaka. He’s had an illustrious career since graduating from the Jawaharlal Nehru Technological University in Hyderabad, India.
Chandra is an experienced software engineer with more than 10 years in the IT field that included stints in major banks. As a Senior Data Engineer with DataSpark, he was one of two speakers from the company who participated in Strata + Hadoop World 2016 in Singapore, a conference which attracted the top minds around the world working on Big Data and analytics.
Chandra spoke on how to go “from telco data to spatial-temporal intelligence APIs”, by “architecting through microservices”. He explained in detail the production architecture at DataSpark and how it works through terabytes of spatial-temporal telco data each day in PaaS mode. Chandra also shared with fellow data scientists attending his talk how the platform operates in SaaS mode.
Chandra began his career with Laser Soft Infosystems (a Polaris company) in 1995 where he progressed rapidly to become a senior software engineer leading a team of 13 before joining Franklin Templeton Investments one-and-a-half years later as a senior software analyst.
Chandra went on to join OpenText as an Advanced Software Engineer for two years, Standard Chartered Bank as a Senior Software Engineer and Framework Specialist for a further two years, and Barclays Investment Bank as a Big Data Lead for a year – before commencing his current job with Singtel and DataSpark in May 2014 as a Data Scientist.
Chandra’s in-depth knowledge of Big Data makes him the best person to give pointers to an aspiring IT engineer in the field. I’ve asked Chandra to share his valuable insights on carving a career in this intensely competitive arena.
What must today’s IT worker do to get a foothold in Big Data?
Chandra: There is a paradigm shift in the way traditional software is being transformed into data-driven software. With the rise of many small and medium startups in the market, various cloud-ready tools – in terms of storage, compute, processing and visualisation – have emerged.
Today’s IT worker should inculcate a habit of transforming himself from developing traditional software to writing software at scale. Processing Big Data at scale with optimised compute and storage is the obvious challenge that an individual will face. There are a good number of Apache projects like Spark and Flink that an IT worker can adopt to start the learning.
Big Data Analytics has taken its place in various segments of the industry. In the Big Data space, one has to learn how the entire space works and focus on your core strengths – and then correlate your strengths to the industry. The four V’s of Big Data – Volume, Velocity, Variety and Veracity – will compel one to deal with scale and manipulate data of a realtime nature. This will necessitate the ability to fuse different varieties of data sources to be consumed effectively for data processing and application of algorithms.
Security is one key challenge where one has to deal with the data-at-rest, data-in-motion, and also data-at-transit. Also, there is a great need to learn how to solve problems with secured analytics, as data is the key element here. Big Data applications have to be reactive enough, which enables the data software to be fail-proof and scale easily.
Why did you become a data engineer?
Chandra: There are many areas in the data engineering space such as dealing with infrastructure, data cleansing, data storage, data serving through APIs, data governance and security. I am a data savvy person, I have always liked to spend time solving problems which are at scale. This is an emerging space which requires better data structures and storage formats, as well as faster ingestion. I want to leave my footprint on architecting and building a good Data PaaS for both on-Premise and Cloud.
How did you become a data engineer?
Chandra: I am a Computer Science engineer and full stack developer – by choice. I come from a diverse industry background that includes Banking & Financial Services, Content Management Services and the Telecoms Industry. I have built various integration platforms that were transactional in nature, having worked at various banks and investment corporations. I also worked on analytics and Business Intelligence software prior to joining DataSpark.
I have gone through every piece of information and article that Hadoop and its ecosystem offers, in applications for Big Data analytics. I was an active community member that took part in Big Data CoE in previous companies and also in BigData.SG. I even built a hadoop cluster at my home with two laptops and a desktop.
My first project was to build a graph database for an identity management system, at Terabyte scale. After that, I built a recommendation engine that takes both data-at-rest and data-in-motion for a News App. This was accepted as a paper at an IEEE conference on Big Data. I also created an image search using MPEG-7 formats for an e-commerce search.
My thirst for solving Big Data problems in terms of scale led me to converting the geo-analytics product at DataSpark from completely Python-based MapReduce to Spark Java. Thereafter we have successfully executed the first terabyte-scale project and reduced the query response time for a Big Data application to the order of milliseconds.
Why did you join DataSpark?
Chandra: DataSpark is a Geo-Analytics company that holds access to the largest market share of telco data in this region. The company is one of the first few who set up a Hadoop Cluster in Singapore. I feel really great about the team spirit and the ability to produce some real actionable insights from the company’s network data.
Here, we get the chance to do real world applications that are closer to the community and society – something that always gets me excited. DataSpark gives me the ability to build a multitude of diverse applications on its platform and the ability to deploy the platform at other telcos in the region. The company is a great place to work, and you can learn a lot from other world class data scientists that work at DataSpark.
What’s life like as a data engineer?
Chandra: A data engineer harnesses the gigabytes of data that flows through the disks and pipes, through data cleansing, efficient storage and processing, and also by delivering metrics on the data.
He/she is responsible for building the data platform which ingests, transforms and produce actionable insights as APIs, to be consumed as microservices by various Client Applications. A data engineer also owns the dataflow pipeline on a terabyte scale which is built on popular industry technologies such as Hadoop, Spark, Kafka, Docker, MicroServices and Streaming.
How do you envisage your work going into the future?
Chandra: As a data engineer, I get a chance to work on lots of open source technologies and to collaborate with other open source projects originating from various well known universities. Presenting the work at top-tier conferences is one of the best opportunities that I look forward to.
Any advice for IT workers/students aspiring to be a data engineer?
Chandra: As an aspiring data engineer, you can choose one of the different areas in the Big Data space – such as Data infrastructure, Data science consulting, Data Engineer, Data Platform Engineer. Each one of these arenas has its own career path – you should correlate your core strengths with the various areas before making a choice.
Start with a single node Hadoop cluster, install the full stack as one. Choose one of the programming languages you are good at. Set up one of the processing frameworks like Spark , Apex , Beam or Flink. Look for some learning projects for learning Spark or Flink. There are thousands of datasets across the web – choose a better data set with good data points having location, time and user events, and then try to ask few questions on the data by writing some simple programs, which makes you do some analysis.
If you are interested in a career in Big Data analytics contact Chandra or his fellow scientists at DataSpark to find out more about what it takes and what to expect.
To find out more about Chandra’s work, visit the following: