Home / Blog / Big Data Juggernaut IV

Big Data Juggernaut IV

Anil Vishnu Vaidya

Author: Anil Vishnu Vaidya

Date: Tue, 2018-06-19 11:38

The big data is taking big leaps with Spark based products. The cloud as well as on-premises solutions deploy spark in their offerings. I wrote earlier that Spark is taking over Hadoop as big data mainstay. The Spark needs to be supported by additional access mechanisms and the programming languages. Not surprisingly Python is rising to the occasion. A special version of python named ‘PySpark’ does this very well. PySpark is the programming interplay with provision for accessing Spark based datasets. It has built-in libraries that allows programmers to do computation of data stored under Apache Spark.

This simply means if someone is using Spark, he/she needs to work with PySpark too. Going further the PySpark being based on Python one needs to know bit of Python too. One of the easier ways to start working on PySpark is the use of Jupyter Notebook. By now you have gauged the number of different technologies have to integrate to be able to get into Big data project. It is imperative that one has to have a combination of Business mindset and liking for technological innovations.

Technology is developing at a rapid pace, beyond imagination. Number of people and companies working in this arena has been phenomenally high, also spread geographically all over the world. We, to be successful, have to keep an eye on these developments but also upgrade ourselves all the time. Just think of how many different technologies I brought together in this short blog, starting from Spark to Python, to PySpark and Jupyter Notebook, all within the ambit of BIG DATA.       



Dear Anil, I had gone through all your four blogs related to “Big Data Juggernaut” and all are very insightful related to “arrival of new technologies, databases, programming languages, and applications”. With a growth in big data from petabyte to exabyte, the need for managing the knowledge associated with this huge data efficiently is another business need which organizations will look at. This is exactly where big data analytics will play a major role. Big data analytics will cut across all fields of business, governments, and countries. The collaborated insight that will be generated will be of tremendous use in the coming days. It may be for connected cars, factories, smart cities, airports etc. This Big Data analytics on the Big-IoT platform will be part of daily life. This pretty much highlights the need for a future generation with public & private clouds. Also helps to manage unstructured data which at this moment which is currently difficult to visualize. In this new wave of Big IoT platform, Big Data and Machine Learning, the traditional digital Big Data of Engineering and R&D are less talked at this moment. The power of machine learning and Big Data on engineering data can do a miracle in quicker product development by throwing many effective design patterns to pick and choose from thousands of options which traditionally cannot be thought off. This is a great opportunity to adopt and lead Big Data Analytics journey.

Thank you, sir, for sharing these set of 4 insightful articles. It has been observed in the past 2-3 years that a lot if useful information can be extracted from the data that is being generated in real time. In a country like India with a population of 1.3 Billion people, with more than 50% of its population below the age of 25 and more than 65% below the age of 35, we can only expect the number of internet users to grow at a higher rate every year. With a diverse population and varying needs and wants, the companies would need to develop customized products to become a market leader in such a market. The “Unicorn” companies generate data in size of terabytes everyday just by serving the Indian population and optimize their products on a daily basis by processing this big-data efficiently. Also, it can be seen in majority of the Indian startups pushing towards setting up a Big-data-Engineering division comprising of people from business, IT, and even from arts domain to make sense of the data generated by the company and its customers to identify the trends in consumer’s behavior with the companies’ products. A few years ago, only Hadoop and MapReduce framework were the major technologies that were used to process Big-data, and now we have a whole “big-data stack”, like Spark, PySpark, Pig, Hive, Sqoop Impala, etc. developed to process it according to our needs and more efficiently. Major players in banking industry like Citibank, American Express etc. use big-data technologies to process data and identify fraud in real time. By observing such trends in the usages of big data technology in just Indian context, we can surely conclude that Big-data is here to stay and it will grow at an accelerated rate in the years to come and even more at a global level.

Add new comment

Bhavan's Campus
Munshi Nagar | Dadabhai Road,
Andheri West | Mumbai - 400 058, India
Tel:+91-22-2623-0396/ 2401