26.03.2013 Views

Stanford - CS145 - Fall 2012 - The Stanford University InfoLab

Stanford - CS145 - Fall 2012 - The Stanford University InfoLab

Stanford - CS145 - Fall 2012 - The Stanford University InfoLab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Wednesday, October 10, 12<br />

Help Session 2<br />

<strong>The</strong> SQL<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Poll<br />

- Who has used SQL before?<br />

- Who has never used SQL before?<br />

- And lastly...<br />

- Anyone secretly a backend ninja merely<br />

taking this class for the easy units? :)<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Goal for Today<br />

- No new material, only new insights about<br />

how to construct queries and avoid<br />

common pitfalls<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

- Quick syntax review<br />

Agenda<br />

- Sample problems from “(extras)”<br />

- Assignment #2 Challenge Problem<br />

· Debugging on the command line<br />

- SQLite vs. MySQL vs. PostreSQL<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Digression: SQL Joke!<br />

- A database engineer walks into a bar and<br />

sees two tables.<br />

- Says, “Can I join you?” ^_^<br />

(Source: http://ask.sqlservercentral.com/questions/3898/which-is-the-best-sql-joke.html)<br />

- Moving on...<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Quick Syntax Review<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


SQL Query Keywords<br />

Select From Where Group By Having<br />

Wednesday, October 10, 12<br />

Order By Distinct<br />

union intersect except<br />

exists is NULL is not NULL<br />

{=,,,=} any all<br />

count() avg() sum()<br />

max() min() first() last()<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


SQL Modification Keywords<br />

Create Table T1(A integer, ...);<br />

Drop Table T1;<br />

Delete From T1 Where [Condition];<br />

Update T1 Set A = [Value/Subquery];<br />

Wednesday, October 10, 12<br />

Insert Into T1 [Subquery] or<br />

Values([Tuple]);<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Quick Syntax Review<br />

Done!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Sample Problems<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 1: Look at the schema.<br />

Movie<br />

Rating<br />

rID mID stars ratingDate<br />

mID title year director<br />

Reviewer<br />

rID name<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 2: Find your targets.<br />

Movie<br />

Rating<br />

rID mID stars ratingDate<br />

mID title year director<br />

Reviewer<br />

rID name<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 3: Trace a “join route”.<br />

Movie<br />

Rating<br />

mID title year director<br />

Join<br />

rID mID stars ratingDate<br />

Reviewer<br />

rID name<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 3: Trace a “join route”.<br />

Join<br />

Movie<br />

Rating<br />

rID mID stars ratingDate<br />

mID title year director<br />

Join<br />

Reviewer<br />

rID name<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 4: Write that join out.<br />

Select *<br />

From Reviewer, Rating, Movie<br />

Where Reviewer.rID = Rating.rID<br />

and Rating.mID = Movie.mID;<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 5: Add the selection condition(s).<br />

Select *<br />

From Reviewer, Rating, Movie<br />

Where Reviewer.rID = Rating.rID<br />

and Rating.mID = Movie.mID<br />

and Movie.title =<br />

“Gone with the Wind”;<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 6: Project only the columns you need.<br />

Select Reviewer.name<br />

From Reviewer, Rating, Movie<br />

Where Reviewer.rID = Rating.rID<br />

and Rating.mID = Movie.mID<br />

and Movie.title =<br />

“Gone with the Wind”;<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 1<br />

- Find the names of all reviewers who rated<br />

Gone with the Wind.<br />

Step 7: Resolve duplicates!<br />

Select Distinct Reviewer.name<br />

From Reviewer, Rating, Movie<br />

Where Reviewer.rID = Rating.rID<br />

and Rating.mID = Movie.mID<br />

and Movie.title =<br />

“Gone with the Wind”;<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Will it always be that involved?<br />

- At first, probably...but you’ll get better<br />

- Have a process, and you’ll rarely go<br />

wrong!<br />

- At least, you’ll rarely get stuck (just<br />

accept now that your first attempt might<br />

not always work)<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Extra Movie-Rating Query Exercises<br />

Question 4<br />

- Find the titles of all movies not reviewed<br />

by Chris Jackson.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Sample Problems<br />

Done!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

A2 Challenge Problem<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Digression: SQL Joke!<br />

- A child goes down to breakfast one<br />

morning and tells his parents, “I prefer<br />

Count Star to Count Dracula...<br />

- ...because I don’t like having to consider<br />

which Draculas are NULL.” ^_^<br />

(Source: http://ask.sqlservercentral.com/questions/3898/which-is-the-best-sql-joke.html)<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Digression: SQL Joke!<br />

- A child goes down to breakfast one<br />

morning and tells his parents, “I prefer<br />

count(*) to count(Dracula)...<br />

...because I don’t like having to consider<br />

which Draculas are NULL.” ^_^<br />

(Source: http://ask.sqlservercentral.com/questions/3898/which-is-the-best-sql-joke.html)<br />

- Moving on...<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Create a Test Relation<br />

- Install SQLite3 (http://<br />

mislav.uniqpath.com/rails/install-sqlite3/)<br />

- Running Mac OS X? You already have it!<br />

- Run sqlite3 from the command line<br />

- >> Create Table Edge(n1<br />

integer, n2 integer);<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Create a Test Relation<br />

- Install SQLite3 (http://<br />

mislav.uniqpath.com/rails/install-sqlite3/)<br />

- Running Mac OS X? You already have it!<br />

- Run sqlite3 from the command line<br />

- >> Create Table Edge(n1<br />

integer, n2 integer);<br />

Don’t forget the semicolon!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Insert Some Data<br />

1<br />

3<br />

>> Insert into Edge values(1,2);<br />

>> Insert into Edge values(1,3);<br />

>> Insert into Edge values(1,4);<br />

>> Insert into Edge values(2,1);<br />

>> Insert into Edge values(2,3);<br />

>> Insert into Edge values(3,4);<br />

2<br />

4<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

Step 1: Create a relation mapping<br />

node IDs to their out-degrees.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

>> Select *<br />

From Edge<br />

Group By n1;<br />

Bad News Bears!<br />

This returns for each n1<br />

a single, arbitrary tuple<br />

having that n1 value.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

>> Select n1, count(*)<br />

From Edge<br />

Group By n1;<br />

Fix: Use an aggregate<br />

function to map each<br />

n1 to a meaningful<br />

summative value.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

- Now, how do we aggregate data from this<br />

new relation?<br />

Bring on the FROM clause subquery!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

>> Select *<br />

From (<br />

) R;<br />

Select n1,<br />

count(*) as outDegree<br />

From Edge<br />

Group By n1<br />

We need to give this a name<br />

so we can refer to it<br />

outside of the subquery.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Selecting the avg(outDegree) might<br />

sound like the next step, but that’s<br />

actually incorrect!<br />

- It would fail to consider nodes that have<br />

zero outgoing edges.<br />

- How do we account for those?<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- We need another subquery to count the<br />

number of unique nIDs in the table:<br />

...<br />

(<br />

Select count(*)<br />

From (<br />

Select n1<br />

From Edge<br />

)<br />

)<br />

...<br />

union -- Automatically eliminates duplicates.<br />

Select n2<br />

From Edge<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- We then use our two subquery results to<br />

compute the average:<br />

Select<br />

(sum(R.outDegree) + 0.0) /<br />

(<br />

Select count(*)<br />

From (<br />

Select n1<br />

From Edge<br />

union -- Automatically eliminates duplicates.<br />

Select n2<br />

From Edge<br />

)<br />

)...<br />

Hacky way to cast the<br />

sum as a float.<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

<strong>The</strong> Final Query...<br />

Select<br />

(sum(R.outDegree) + 0.0) /<br />

(<br />

Select count(*)<br />

From (<br />

Select n1<br />

From Edge<br />

)<br />

union -- Automatically eliminates duplicates.<br />

Select n2<br />

From Edge<br />

)<br />

From<br />

(<br />

Select n1, count(*) as outDegree<br />

From Edge<br />

Group By n1<br />

) R;<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Let’s Try Something<br />

- Write a SQL query to find the average outdegree<br />

of nodes in the graph.<br />

Last Step: CHECK YOUR ANSWER!<br />

- Node 1: 3 out-edges +<br />

Node 2: 2 out-edges +<br />

Node 3: 1 out-edges +<br />

Node 4: 0 out-edges = 6 total / 4 = 1.5<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

A2 Challenge Problem<br />

Done!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

MySQL vs. SQLite<br />

vs. PostreSQL<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

MySQL SQLite<br />

in<br />

all<br />

union<br />

exists<br />

any<br />

except<br />

intersect<br />

PostgreSQL<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

MySQL vs. SQLite<br />

vs. PostreSQL<br />

Done!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

- Quick syntax review<br />

- Sample Problems<br />

Agenda<br />

- Assignment #2 Challenge Problem<br />

- SQLite vs. MySQL vs. PostreSQL<br />

Done!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Questions?<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>


Wednesday, October 10, 12<br />

Thanks for coming!<br />

<strong>Stanford</strong> - <strong>CS145</strong> - <strong>Fall</strong> <strong>2012</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!