Hive-分桶

Hive

template

发布日期: 2023-04-22

文章字数: 168

阅读时长: 1 分

阅读次数:

创建桶表

set hive.enforce.bucketing=true;
set mapreduce.job.reduces=4;

drop table person_buck;

create table person_buck(sid int ,sname string)
partitioned by(sex string)
clustered by(sid)
sorted by(sid DESC)
into 4 buckets
row format delimited
fields terminated by ',';

insert into person_buck partition(sex) select sid,sname,sex from person_p;

桶表抽样查询

select * from table_name tablesample(bucket X out of Y on field);

select * from person_buck tablesample(bucket 1 out of 2 on sid);

X 表示从哪个桶中开始抽取
Y 表示相隔多少个桶再次抽取
- Y 必须为分桶数量的倍数或者因子，比如分桶数为 6，Y 为 6，则表示只从桶中抽取 1 个 bucket 的数据；若 Y 为 3，则表示从桶中抽取 6/3 (2)个 bucket 的数据

钱不寒

https://jxch.github.io/2023/04/22/architect/hadoop/hive-fen-tong/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源钱不寒 !

Hive

Hive-UDF自定义函数-大小写转换Demo

2023-04-22 template

Hive

Hive-窗口函数-常用函数

2023-04-22 template

Hive

Hive-分桶

创建桶表

桶表抽样查询

你的赏识是我前进的动力