
CDA-数据库应用
前面提到过一些数据库应用的基础知识,数据分析师经常和数据库打交道,在查询这块整理了必备技巧如下:
📌 基础优化技巧1. 善用EXPLAIN分析执行计划
EXPLAINSELECT * FROM orders WHERE order_date > '2024-01-01';
2. 避免SELECT * 的陷阱
SELECT order_id, customer_name, order_amountFROM orders;
3. 给表取简洁别名
SELECT o.order_id, c.customer_nameFROM orders AS oJOIN customers AS c ON o.customer_id = c.id;
📅 日期处理技巧
4. 安全转换日期格式
SELECT DATE_TRUNC('month', order_date)FROM orders;SELECT DATE_FORMAT(order_date,'%Y-%m')FROM orders;
SELECT name, AGE(birth_date)FROM employees;
6. 获取当前时间戳
SELECT CURRENT_TIMESTAMP AS now;
🚀 高级查询技巧
7. 窗口函数实战
SELECT product_id, sale_date, SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS running_totalFROM sales;
8. CTE递归查询
WITH RECURSIVE org_tree AS ( SELECT id, name, parent_id FROM departments WHERE parent_id IS NULL UNION ALL SELECT d.id, d.name, d.parent_id FROM departments d JOIN org_tree ot ON d.parent_id = ot.id)SELECT * FROM org_tree;
🧹 数据清洗技巧
9. 缺失值处理
SELECT COALESCE(address, '未填写') AS safe_addressFROM customers;
10. 数据脱敏处理
SELECT name, CONCAT('****', RIGHT(phone, 4)) AS secure_phoneFROM users;
11. 快速去重
SELECT DISTINCT departmentFROM employees;
⚡ 性能优化技巧
12. 索引使用原则
CREATE INDEX idx_orders_date_statusON orders (order_date, order_status);
13. 分区表优化
CREATE TABLE sales ( id INT, sale_date DATE) PARTITION BY RANGE (EXTRACT(YEAR FROM sale_date));
14. 批处理代替循环
UPDATE productsSET price = price * 1.1WHERE category = '电子产品';
🔍 实战技巧
15. 快速抽样
SELECT *FROM usersORDER BY RANDOM()LIMIT 100;
16. 行列转换
SELECT product_id, SUM(CASE WHEN quarter = 'Q1' THEN sales END) AS Q1, SUM(CASE WHEN quarter = 'Q2' THEN sales END) AS Q2FROM sales_dataGROUP BY product_id;
17. 连续登录检测
SELECT user_id, login_date, LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) AS prev_loginFROM login_records;
🛠️ 实用函数
18. 字符串分割
SELECT SPLIT_PART(full_name, ' ', 1) AS first_name, SPLIT_PART(full_name, ' ', 2) AS last_nameFROM contacts;
19. JSON解析
SELECT JSON_EXTRACT(user_data, '$.address.city') AS cityFROM user_profiles;
20. 正则表达式匹配
SELECT *FROM logsWHERE message ~ 'ERROR [1-5]{3}';
🎯 总结精要