Workshop on databases and SQL. See the top menu for specific topics.
View the Project on GitHub berkeley-scf/scf-databases-2024
This project is maintained by berkeley-scf, the UC Berkeley Statistical Computing Facility
Hosted on GitHub Pages — Theme by orderedlist
In this session, we’ll work through grouping/aggregation/reduction, joins (including self-joins), set operations, and subqueries.
In small groups, discuss what these queries do.
select tag, count(*) as n from questions_tags \
group by tag
select tag, count(*) as n from questions_tags \
group by tag having n > 1000
select ownerid, count(*) as n from questions \
group by ownerid order by n desc limit 15
select ownerid, sum(viewcount) as viewed from questions \
group by ownerid
select *, sum(viewcount) as viewed from questions \
group by ownerid
select answercount, commentcount, count(*) as n from questions \
group by answercount, commentcount
select tag, count(*) as n from questions_tags \
where tag like 'python%' group by tag having n > 1000
The query above starting with select *, sum(viewcount)
behaves differently in SQLite and DuckDB.
In small groups, discuss what these queries do.
select * from questions join questions_tags \
on questions.questionid = questions_tags.questionid
select * from questions Q join questions_tags T \
on Q.questionid = T.questionid
select * from questions Q join questions_tags T \
using(questionid)
select * from questions Q, questions_tags T \
where Q.questionid = T.questionid
select * from questions Q left outer join answers A \
on Q.questionid = A.questionid
select * from questions Q left outer join answers A \
on Q.questionid = A.questionid \
where A.creationdate is NULL
# Note no right outer join in SQLite so here we reverse order of answers and questions \
select * from questions Q right outer join answers A \
on Q.questionid = A.questionid \
where Q.creationdate is NULL
select Q.questionid, count(*) as n_tags from questions Q join questions_tags T \
on Q.questionid = T.questionid \
group by Q.questionid
First we’ll set up a view (a temporary) table that combines questions and tags for ease of illustrating ideas around self joins.
create view QT as select * from questions join questions_tags using(questionid)
In small groups, discuss what these queries do.
select * from QT as QT1 join QT as QT2 \
using(questionid)
select * from QT as QT1 join QT as QT2 \
using(questionid) where QT1.tag < QT2.tag
select QT1.tag, QT2.tag, count(*) as n from QT as QT1 join QT as QT2 \
using(questionid) where QT1.tag < QT2.tag \
group by QT1.tag, QT2.tag order by n desc limit 10
select * from QT as QT1 join QT as QT2 using(ownerid)
In small groups, discuss what these queries do.
select ownerid from QT where tag='python' \
intersect \
select ownerid from QT where tag='r'
select ownerid from QT where tag='python' \
except \
select ownerid from QT where tag='r'
select ownerid from QT where tag='python' \
union \
select ownerid from QT where tag='r'
select userid, displayname, location from users \
where location like '%United States%' \
intersect \
select userid, displayname, location from users \
where reputation > 10
In small groups, discuss what these queries do.
select * from \
answers A \
join \
( select ownerid, count(*) as n_answered from answers \
group by ownerid order by n_answered desc limit 1000 ) most_responsive \
on A.ownerid = most_responsive.ownerid
select avg(upvotes) from users \
where userid in \
( select distinct ownerid from \
questions join questions_tags using(questionid) \
where tag = 'python' )