帮我得到第一份数据分析师工作的9个SQL核心概念

帮我得到第一份数据分析师工作的9个SQL核心概念

如果我必须重新开始我的数据科学之旅,我会先学习SQL。

我不算SQL的拥趸,但你确实很需要SQL去通过数据科学/分析师面试,无论你的职位是什么,你都会经常使用SQL创建查询并与公司的数据库进行交互。

下面是帮助我通过第一次数据分析师面试的9个SQL核心概念。如果你想了解更多关于数据分析的相关内容,可以阅读以下这些文章:
只会Pandas?来学习这25种Pandas变SQL的方法,让你的数据分析更得心应手!
数据分析求职最常用的30种大数据工具,你掌握几个了?
数据科学家V.S数据分析师面试全对比
数据分析在Supply Chain方向有哪些应用?

// 数据

为了展示这些概念,我们将使用一个包含100多名顶级足球运动员信息的表。你可以在这里(https://drive.google.com/file/d/1ZpBC71J-SFbA0tgJjWiNy58QA57YT1I8/view?usp=sharing)下载此数据并将其导入你的数据库。

下面是我们将使用的表格,我将其命名为players_description。

Initial table: players_description
+----+-------------------+------------+-----+-----+---------------+
| ID |     FirstName     |  LastName  | Age | ... | Team_Position |
+----+-------------------+------------+-----+-----+---------------+
|  1 | Lionel Andrés     | Messi      |  32 | ... | RW            |
|  2 | Cristiano Ronaldo | dos Santos |  34 | ... | LW            |
|  3 | Neymar            | da Silva   |  27 | ... | CAM           |
|  4 | Jan               | Oblak      |  26 | ... | GK            |
|  5 | Eden              | Hazard     |  28 | ... | LW            |
|  6 | Kevin             | De Bruyne  |  28 | ... | CAM           |
|  7 | Marc-André        | ter Stegen |  27 | ... | GK            |
|  8 | Virgil            | van Dijk   |  27 | ... | LF            |
|  9 | Luka              | Modrić     |  33 | ... | CM            |
| 10 | Mohamed           | Salah      |  27 | ... | RW            |
| .. |     ...           |   ...      | ... | ... | ...           |
+----+-------------------+------------+-----+-----+---------------+

// 1、SELECT语句

SELECT语句允许我们从数据库中选择数据,是SQL中最常见的语句,因此请记住下面的语法。

让我们从players_description表中选择列FirstName、LastName和Height。

Query:SELECT FirstName, LastName, Height
FROM players_description;Output:+---------------------+--------------+--------+
|      FirstName      |   LastName   | Height |
+---------------------+--------------+--------+
| "Lionel Andrés"     | Messi        |    170 |
| "Cristiano Ronaldo" | "dos Santos" |    187 |
| Neymar              | "da Silva"   |    175 |
| Jan                 | Oblak        |    188 |
| Eden                | Hazard       |    175 |
| Kevin               | "De Bruyne"  |    181 |
| Marc-André          | "ter Stegen" |    187 |
| Virgil              | "van Dijk"   |    193 |
| Luka                | Modrić       |    172 |
| Mohamed             | Salah        |    175 |
+---------------------+--------------+--------+

// 2、SELECT DISTINCT语句

每当要从列中选择唯一元素时,我们都会使用SELECT DISTINCT语句。

让我们看看players_description表中唯一的Team_Position。

Query:SELECT DISTINCT Team_Position
FROM players_description;Output:+---------------+
| Team_Position |
+---------------+
| RW            |
| LW            |
| CAM           |
| GK            |
| LF            |
| CM            |
| ST            |
| CDM           |
| LM            |
| RM            |
| RB            |
| CF            |
| CB            |
| RAM           |
| LAM           |
+---------------+

// 3、WHERE子句

WHERE子句允许我们向“查询”添加条件,只有那些满足条件的值才会在输出表中显示。

让我们创建一个查询,过滤掉身高Height超过180厘米的球员。

Query:SELECT FirstName, LastName, Height
FROM players_description
WHERE Height>180;Output:+---------------------+--------------+--------+
|      FirstName      |   LastName   | Height |
+---------------------+--------------+--------+
| "Cristiano Ronaldo" | "dos Santos" |    187 |
| Jan                 | Oblak        |    188 |
| Kevin               | "De Bruyne"  |    181 |
| Marc-André          | "ter Stegen" |    187 |
| Virgil              | "van Dijk"   |    193 |
| Kalidou             | Koulibaly    |    187 |
| Harry               | Kane         |    188 |
| Alisson             | Becker       |    191 |
| David               | "De Gea"     |    192 |
| Giorgio             | Chiellini    |    187 |
+---------------------+--------------+--------+

// 4、COUNT()、AVG()和SUM()函数

SQL中最常用的函数是:count“COUNT()”、average“AVG()”和sum“SUM()”。

用上我们刚学过的WHERE子句,下面的就是COUNT()、AVG()和SUM()的语法。

让我们看看我们可以从FC Barcelona俱乐部获得的见解。

Query 1: Let's see how many Barcelona players are in our tableSELECT COUNT(Playerid)
FROM players_description
WHERE Club="FC Barcelona";Output 1:+-----------------+
| COUNT(Playerid) |
+-----------------+
|              15 |
+-----------------+Query 2: Calculate the average height of Barcelona playersSELECT AVG(Height)
FROM players_description
WHERE Club="FC Barcelona";Output 2:+-------------+
| AVG(Height) |
+-------------+
|    181.2667 |
+-------------+Query 3: Sum the wage of Barcelona playersSELECT SUM(Wage_eur)
FROM players_description
WHERE Club="FC Barcelona";Output 3:+---------------+
| SUM(Wage_eur) |
+---------------+
|          3945 |
+---------------+

// 5、ORDER BY

每当我们想要按升序或降序对输出表进行排序时,我们都会使用ORDER BY关键字。

让我们看看谁是我们表中最高的运动员。

Query:SELECT FirstName, LastName, Height
FROM players_description
ORDER BY Height DESC;Output:+-----------+-------------+--------+
| FirstName |  LastName   | Height |
+-----------+-------------+--------+
| Thibaut   | Courtois    |    199 |
| Gianluigi | Donnarumma  |    196 |
| Jiří      | Pavlenka    |    196 |
| Wojciech  | Szczęsny    |    195 |
| Niklas    | Süle        |    195 |
| Zlatan    | Ibrahimović |    195 |
| Gerard    | Piqué       |    194 |
| Virgil    | "van Dijk"  |    193 |
| Samir     | Handanovič  |    193 |
| Manuel    | Neuer       |    193 |
+-----------+-------------+--------+

// 6、AND/OR运算符

我们可以将WHERE子句与不同的运算符(如AND/OR)组合在一起。

下面是要使用的语法。

它们之间的区别是:

  • 如果所有条件都为真,则是AND
  • 如果任何一个条件为真,则是OR

让我们看一些AND/OR的示例。

Query: Show players from France (FRA) with height greater than 180cm.SELECT FirstName, LastName, Height, Nationality
FROM players_description
WHERE Height>180 AND Nationality="FRA";Output:+-----------+-----------+--------+-------------+
| FirstName | LastName  | Height | Nationality |
+-----------+-----------+--------+-------------+
| Paul      | Pogba     |    191 | FRA         |
| Hugo      | Lloris    |    188 | FRA         |
| Aymeric   | Laporte   |    189 | FRA         |
| Karim     | Benzema   |    185 | FRA         |
| Samuel    | Umtiti    |    182 | FRA         |
| Raphaël   | Varane    |    191 | FRA         |
| Clément   | Lenglet   |    186 | FRA         |
| Lucas     | Hernández |    182 | FRA         |
| Stéphane  | Ruffier   |    188 | FRA         |
+-----------+-----------+--------+-------------+Query: Show players that are from France (FRA) or that have a height greater than 180cm.SELECT FirstName, LastName, Height, Nationality
FROM players_description
WHERE Height>180 OR Nationality="FRA";Output:+---------------------+--------------+--------+-------------+
|      FirstName      |   LastName   | Height | Nationality |
+---------------------+--------------+--------+-------------+
| "Cristiano Ronaldo" | "dos Santos" |    187 | PRT         |
| Jan                 | Oblak        |    188 | SGP         |
| Kevin               | "De Bruyne"  |    181 | BLX         |
| Marc-André          | "ter Stegen" |    187 | DEU         |
| Virgil              | "van Dijk"   |    193 | NPL         |
| Kylian              | Mbappé       |    178 | FRA         |
| Kalidou             | Koulibaly    |    187 | SEN         |
| Harry               | Kane         |    188 | UK          |
| Alisson             | Becker       |    191 | BRA         |
| David               | "De Gea"     |    192 | ESP         |
+---------------------+--------------+--------+-------------+

// 7、BETWEEN运算符

BETWEEN运算符可以选择给定范围内的值(如数字、文本或日期)。此运算符是包含性的,从开始值到结束值都将包含在内。

让我们来看看哪些运动员的年龄在20到30岁之间。

Query:SELECT FirstName, LastName, Age
FROM players_description
WHERE Age BETWEEN 20 AND 30;Output:+------------+--------------+-----+
| FirstName  |   LastName   | Age |
+------------+--------------+-----+
| Neymar     | "da Silva"   |  27 |
| Jan        | Oblak        |  26 |
| Eden       | Hazard       |  28 |
| Kevin      | "De Bruyne"  |  28 |
| Marc-André | "ter Stegen" |  27 |
| Virgil     | "van Dijk"   |  27 |
| Mohamed    | Salah        |  27 |
| Kylian     | Mbappé       |  20 |
| Kalidou    | Koulibaly    |  28 |
| Harry      | Kane         |  25 |
+------------+--------------+-----+

// 8、IN/NOT IN运算符

当我们想在WHERE子句中指定多个值时,我们使用IN运算符。

让我们列出一份效力于FC Barcelona俱乐部或Real Madrid俱乐部的运动员名单。

Query:SELECT FirstName, LastName, Age, Club
FROM players_description
WHERE Club IN ('FC Barcelona', 'Real Madrid');Output:+-----------------+--------------+-----+----------------+
|    FirstName    |   LastName   | Age |      Club      |
+-----------------+--------------+-----+----------------+
| "Lionel Andrés" | Messi        |  32 | "FC Barcelona" |
| Eden            | Hazard       |  28 | "Real Madrid"  |
| Marc-André      | "ter Stegen" |  27 | "FC Barcelona" |
| Luka            | Modrić       |  33 | "Real Madrid"  |
| Sergio          | Ramos        |  33 | "Real Madrid"  |
| Luis            | Alberto      |  32 | "FC Barcelona" |
| Sergio          | Busquets     |  30 | "FC Barcelona" |
| Antoine         | Griezmann    |  28 | "FC Barcelona" |
| Thibaut         | Courtois     |  27 | "Real Madrid"  |
| Gerard          | Piqué        |  32 | "FC Barcelona" |
+-----------------+--------------+-----+----------------+

// 9、LIKE-NOT LIKE运算符

当我们想要在列中搜索特定模式时,我们使用LIKE运算符。

我们可以在模式中使用不同的通配符。下面是两个最常见的通配符:

  • %:表示零个、一个或多个字符
  • _:表示单个字符

让我们过滤名字以字母“S”开头的球员。

Query:SELECT FirstName, LastName
FROM players_description
WHERE FirstName LIKE 's%';Output:+-----------+------------------+
| FirstName |     LastName     |
+-----------+------------------+
| Sergio    | Agüero           |
| Sergio    | Ramos            |
| Sergio    | Busquets         |
| Samir     | Handanovič       |
| Sadio     | Mané             |
| Samuel    | Umtiti           |
| Sergej    | Milinković-Savić |
| Saúl      | Ñíguez           |
| Serge     | Gnabry           |
| Stefan    | "de Vrij"        |
| Stéphane  | Ruffier          |
| Salvatore | Sirigu           |
| Sokratis  | Papastathopoulos |
| Sergio    | Canales          |
+-----------+------------------+

利用好%和_,你可以在各种模式里驰骋(详见:https://www.w3schools.com/sql/sql_like.asp

感谢阅读。你还可以订阅我们的YouTube频道,观看大量大数据行业相关公开课:https://www.youtube.com/channel/UCa8NLpvi70mHVsW4J_x9OeQ;在LinkedIn上关注我们,扩展你的人际网络!https://www.linkedin.com/company/dataapplab/

原文作者:Frank Andrade
翻译作者:高佑兮
美工编辑:过儿
校对审稿:Chuang
原文链接:https://towardsdatascience.com/9-sql-core-concepts-that-helped-me-get-my-first-data-analyst-job-a582f892276f