ClickHouse的安装与配置

艺帆风顺 发布于 2025-04-03 17 次阅读


ClickHouse 是一个真正的列式数据库管理系统(DBMS)。在 ClickHouse 中,数据始终是按列存储的,包括矢量(向量或列块)执行的过程。只要有可能,操作都是基于矢量进行分派的,而不是单个的值,这被称为«矢量化查询执行»,它有利于降低实际的数据处理开销。通常有两种不同的加速查询处理的方法:矢量化查询执行和运行时代码生成。在后者中,动态地为每一类查询生成代码,消除了间接分派和动态分派。这两种方法中,并没有哪一种严格地比另一种好。运行时代码生成可以更好地将多个操作融合在一起,从而充分利用 CPU 执行单元和流水线。矢量化查询执行不是特别实用,因为它涉及必须写到缓存并读回的临时向量。

ClickHouse版本记录

1. 21.3.12.2-lts版本中解决了MaterializeMySQL同步中二级索引键非空的问题。

常用命令

--安装CK
sh ClickHouse_INSTALL_20.8.12.2-lts_xxx.bin

--卸载CK
sh ClickHouse_INSTALL_20.8.12.2-lts_xxx.bin -e

--启停CK
systemctl start/stop clickhouse-server

--客户端登陆
clickhouse-client -uroot --password='Infra5@Gep0int' -h127.0.0.1 --port=9000

--MaterializeMySQL引擎
SET allow_experimental_database_materialize_mysql=1;
CREATE DATABASE temp ENGINE = MaterializeMySQL('127.0.0.1:3306', 'temp', 'root', 'Gepoint');

--create table as select from mysql
create table a ENGINE=MergeTree order by id as select * from mysql('127.0.0.1:3306','test','a','root','Gepoint') ;

--强制合并
OPTIMIZE TABLE default.project_count FINAL;

--查询中合并去重
select * from table final;

--历史SQL
select type,address,query_start_time,query_duration_ms,current_database,query from system.query_log where type = 'QueryStart' order by query_start_time desc limit 20;

--查看表行数
select database,name,engine,data_paths,partition_key,total_rows from system.tables t where t.database = 'njzwfw' order by total_rows desc

--查看数据量
select
database as "库名",
table as "表名",
sum(rows) as "总行数",
formatReadableSize(sum(data_uncompressed_bytes)) as "原始大小",
forgrep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"matReadableSize(sum(data_compressed_bytes)) as "压缩大小",
concat(toString(round(sum(data_compressed_bytes) / sum(data_uncompressed_bytes) * 100, 0)),'%') "压缩率" -- 压缩率越小越好
from system.parts
where database = 'bjksjjs_59' -- 修改库名
group by database,table order by sum(data_uncompressed_bytes) desc

--分区数
select database,table,count(*) from system.parts where active = 1
group by database,table order by count(*) desc

-- 历史SQL开销查询(内存、耗时)
select type,event_time,address,query,query_duration_ms,memory_usage,exception_code ,exception,
read_rows ,read_bytes ,written_rows ,written_bytes ,result_rows ,result_bytes ,current_database
from system.query_log
where type in ('QueryFinish','ExceptionBeforeStart','ExceptionWhileProcessing')
and dateDiff('day', event_time, now()) order by memory_usage desc