This tutorial is purely informative. Next part in this tutorial series requires an active Google Cloud account with enabled billing.
Big Data processing and visualization
Big Data ecosystem is huge and it’s constantly growing and evolving. On Monday Hadoop was da bomb. On Tuesday it was Spark. Now it’s Beam. But what about tomorrow ? What about all the work and engineering that went into creating pipelines in the past ?
Google Cloud provides a nice solution to the problem by delivering an ecosystem of Big Data technologies. Their products integrate without a pain with Hadoop, Spark and Beam. BigQuery and Data Studio are two of their solutions worth mentioning.
First time I started using them I have found myself at home. Everything felt familiar.
With BigQuery I used ANSI-compliant SQL language to query huge data sets. Just as I did on Apache Spark with Zeppelin.
Data Studio was not as customizable as other Big Data visualization solutions but it was enough for me. I could create beautiful interactive dashboards that dynamically reacted to time period selection or time window. Just as I did on ElasticSearch with Kibana.
Working with BigQuery and Data Studio felt a bit like having Apache Spark with Zeppelin and ElasticSearch with Kibana combined.
Google's BigQuery is an enterprise data warehouse for analytics that can handle data of any size. If you are in the real-time analytics at scale business then this product is right for you.
BigQuery is serverless which means that there is no infrastructure to manage. It just spits out the results of the queries and you do not care what is going on under the hood. It is also blazing fast because it executes the queries completely in memory.
Free tier for that service grants you 10GB of data storage and 1TB of data processed per month with no additional charges.
Query language is ANSI-compliant SQL which makes everything very familiar for the user. For dealing with more complex use cases you can make calls to BigQuery REST API using client libraries for Java, .NET, Python and other languages. BigQuery also integrates with third-party tools for loading, transforming and visualizing data.
BigQuery also comes with a web UI which is a convenient way to run your queries, load and export data.
How many times have you created a report or a dashboard full of charts and tables based on some processed data ? I bet you have stopped counting a while ago.
There there is an abundance of technologies you can use for it. You can write a custom application that uses a charting library or use platforms like Kibana and Grafana. What if I told you that there is another way ?
If you do not need all the advanced features and tweakability of aforementioned solutions and just want to visualize some data then you should check out Google Data Studio.
Some time ago there was a paid enterprise version of that product called Data Studio 360 but Google has decided to make it free for everyone. Currently you can create and share unlimited amount of reports and dashboards. This was in my opinion a great move by Google and another gift for Google Cloud users. Gracias Google !
Data Studio has the ability to use various data sources. You can choose from several built-in connectors (free) and a lot of community ones (not always free). Some of the available built-in options:
- SQL (BigQuery, MySQL, PostgreSQL, CloudSQL)
- CSV files (direct file upload, Google Cloud Storage, Google Sheets)
- Google's services (AdWords, Google Analytics, YouTube Analytics)
If that is not enough then you can write your own connectors using Google Apps Script.
Connectors allow your dashboards to be dynamically updated. You can select 24 hour window or a from-to period and your dashboard's charts and tables are updated accordingly.
Reports created using Google Data Studio can be very visually appealing. At your Data Studio home page you can view example templates and see for yourself how sexy some of those reports are. Everyone knows that clients love beautiful, shiny, new things and ... pie charts. No client can resist a pie chart.
Report creation process is very intuitive. There are plenty of elements to choose from to present your data in a visual form. Among them are:
- Charts (Time series, Pie chart, Bar chart, Scatter chart, Bullet chart, Area chart)
- Table and pivot table (with sortable columns)
- Other (Geo map, Scorecard, Image)
You can create multipage interactive reports very effectively with this tool. Each page of the report can be an interactive dashboard if you wish so.
I have found this technology easy to learn. Tutorials are clear and very helpful. One of the tutorials has been created in a form of a Data Studio report. I highly recommend finishing that one. There are also some video tutorials accessible from the home page that will introduce you to Data Studio.
Data Studio gains new features and functionality now and then so check the release notes from time to time.