If you’re a data analyst looking to harness the power of cloud-scale analytics, Google BigQuery offers an elegant yet robust SQL interface that can handle terabytes of data with speed and simplicity. In this guide, we’ll walk through real-world, hands-on examples using SQL in BigQuery—designed to help you unlock insights, streamline your queries, and get more done with less overhead. Whether you’re transitioning from traditional SQL environments or starting fresh in the cloud, this article gives you the practical knowledge to hit the ground running.
What Is BigQuery and Why Use It?
Google BigQuery is a serverless, highly scalable, and cost-effective cloud data warehouse. What sets it apart is its ability to process massive datasets in seconds using standard SQL syntax. Analysts can query petabytes of data with near-zero setup—no need for infrastructure management, performance tuning, or indexing.
Key Features:
- Standard SQL support
- Seamless integration with Google Cloud ecosystem
- Real-time analytics with streaming inserts
- Built-in machine learning (BigQuery ML)
- Pay-per-query or flat-rate pricing
BigQuery allows analysts to focus on insight, not infrastructure.
Setting Up: Your First Steps in BigQuery
Before diving into queries, let’s set up the basics.
1. Create a Google Cloud Project
- Visit console.cloud.google.com
- Set up a new project and billing account
2. Enable BigQuery API
- Go to APIs & Services > Enable APIs
- Search for BigQuery API and enable it
3. Open BigQuery Console
- Navigate to the BigQuery UI in Google Cloud Console
You’re now ready to write and run SQL queries in a serverless environment.
BigQuery SQL Basics: Query Structure
BigQuery uses ANSI-compliant SQL, so if you have prior SQL experience, the learning curve is shallow.
Basic Syntax:
SELECT column_name
FROM `project.dataset.table`
WHERE condition
ORDER BY column_name
LIMIT 10;
Let’s break that down using real examples.
Hands-On Example 1: Querying a Public Dataset
BigQuery hosts a wide array of public datasets. Let’s query data from the bigquery-public-data.covid19_jhu_csse.summary
table.
Query: Top 5 countries by confirmed cases
SELECT country_region, SUM(confirmed) AS total_confirmed
FROM `bigquery-public-data.covid19_jhu_csse.summary`
GROUP BY country_region
ORDER BY total_confirmed DESC
LIMIT 5;
Explanation:
- SUM() aggregates confirmed cases
- GROUP BY groups the data per country
- ORDER BY sorts descending
This gives a snapshot of COVID-19’s impact by country.
Hands-On Example 2: Filtering and Date Functions
Let’s explore U.S. weather data and apply filters.
Dataset: bigquery-public-data.ghcn_d.ghcn_d
Query: Average temperature in New York for January 2023
SELECT
DATE(observation_date) AS date,
AVG(value) / 10 AS avg_temp_celsius
FROM
`bigquery-public-data.ghcn_d.ghcn_d`
WHERE
id = 'USW00094728' -- NYC station
AND element = 'TAVG'
AND EXTRACT(YEAR FROM observation_date) = 2023
AND EXTRACT(MONTH FROM observation_date) = 1
GROUP BY date
ORDER BY date;
This query demonstrates BigQuery’s robust date functions and numeric transformations.
Hands-On Example 3: Joining Tables
SQL joins are critical in analysis. Let’s join datasets for richer insights.
Query: Combining COVID and population data
SELECT
covid.country_region,
SUM(covid.confirmed) AS total_confirmed,
pop.population
FROM
`bigquery-public-data.covid19_jhu_csse.summary` AS covid
JOIN
`bigquery-public-data.world_bank_intl_education.population_total` AS pop
ON
LOWER(covid.country_region) = LOWER(pop.country_name)
WHERE
pop.year = 2020
GROUP BY
covid.country_region, pop.population
ORDER BY total_confirmed DESC;
Why this matters:
Joins in BigQuery allow analysts to merge datasets without ETL, enabling on-the-fly data enrichment.
Hands-On Example 4: Window Functions for Advanced Analysis
Window functions offer a way to do rankings, percentiles, and running totals without collapsing groups.
Query: Rank states by COVID confirmed cases
SELECT
province_state,
SUM(confirmed) AS total_confirmed,
RANK() OVER (ORDER BY SUM(confirmed) DESC) AS rank
FROM
`bigquery-public-data.covid19_jhu_csse.summary`
WHERE
country_region = 'US'
GROUP BY province_state
ORDER BY rank;
This adds rankings dynamically, preserving each row while allowing comparison.
Hands-On Example 5: Creating Custom Functions
BigQuery supports User-Defined Functions (UDFs) to extend SQL with JavaScript.
Query: Normalize values using a UDF
CREATE TEMP FUNCTION normalize(value FLOAT64, mean FLOAT64, stddev FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS """
return (value - mean) / stddev;
""";
SELECT
id,
value,
normalize(value, AVG(value) OVER (), STDDEV(value) OVER ()) AS z_score
FROM
`project.dataset.table`;
UDFs are especially useful when standard SQL isn’t enough.
Hands-On Example 6: BigQuery ML for Predictive Analysis
Use BigQuery ML to train and predict models directly within SQL.
Query: Create a linear regression model
CREATE OR REPLACE MODEL `project.dataset.sales_model`
OPTIONS(model_type='linear_reg') AS
SELECT
product_category,
marketing_spend,
revenue
FROM
`project.dataset.sales_data`;
Query: Predict revenue
SELECT
product_category,
marketing_spend,
predicted_revenue
FROM
ML.PREDICT(MODEL `project.dataset.sales_model`,
(SELECT * FROM `project.dataset.new_data`));
Now analysts can build ML workflows without leaving the SQL environment.
Performance Tips for SQL in BigQuery
- **Avoid SELECT ***: Specify columns to reduce read costs
- Use partitioned tables: Speed up time-based queries
- Leverage table preview: Inspect data without incurring costs
- Cache queries: BigQuery automatically caches for 24 hours
- Watch slot usage: Especially in high-concurrency environments
Real-World Use Case: Marketing Funnel Analysis
Let’s say your marketing team tracks user interactions across email, web, and purchases.
Query: User funnel behavior
SELECT
user_id,
MIN(CASE WHEN event_type = 'email_open' THEN event_time END) AS opened_email,
MIN(CASE WHEN event_type = 'site_visit' THEN event_time END) AS visited_site,
MIN(CASE WHEN event_type = 'purchase' THEN event_time END) AS made_purchase
FROM
`project.dataset.user_events`
GROUP BY user_id;
This shows the earliest touchpoints for each user in their journey.
Conclusion
BigQuery is transforming the way data analysts work. It combines the familiarity of SQL with the power of Google Cloud, making it possible to analyze billions of rows without needing to manage infrastructure. With hands-on examples ranging from basic queries to predictive modeling, this guide has provided a practical foundation to grow your skills.
Whether you’re analyzing trends, predicting outcomes, or simply automating reports, BigQuery lets you do more with data, faster and smarter.
Read:
Top 10 Azure SQL Performance Tuning Tips: A Complete Guide for Data Professionals
How to Migrate Your On-Prem SQL Server to Azure SQL: A Complete Guide
Getting Started with Azure SQL Database: A Beginner’s Guide
Authentication and IAM in GCP for C# Applications: A Comprehensive Developer’s Guide
Integrating Google Cloud AI and Machine Learning APIs in a C# Application: A Developer’s Guide
C# and Google Cloud Firestore: Building a Serverless NoSQL App
FAQs
1. Do I need to know advanced SQL to use BigQuery effectively?
No. BigQuery supports standard SQL syntax, making it beginner-friendly. You can start with basic queries and gradually adopt more complex functions like joins, window functions, and machine learning models as you grow more comfortable.
2. Can I import my own datasets into BigQuery for analysis?
Yes. You can upload CSV, JSON, or Avro files directly through the BigQuery console or use Cloud Storage for larger files. Once uploaded, you can query them like any other table.
3. Does BigQuery store data permanently, and is it secure?
BigQuery stores your data in Google Cloud and encrypts it at rest and in transit. You can also configure access using IAM roles, set expiration times on tables, and use audit logs to track access.
4. Is there a free tier available for BigQuery?
Yes. BigQuery offers a generous free tier of 1 TB of query processing per month and 10 GB of storage. This is ideal for learning, prototyping, or light analysis work.
5. Can I automate and schedule SQL queries in BigQuery?
Absolutely. BigQuery lets you schedule queries to run at regular intervals—daily, hourly, or even by the minute—directly from the interface or using Cloud Scheduler for more complex workflows.