Hive基础（三）

你我皆温柔2023-12-142023-12-14

复杂类型、查询

知识回顾

分桶：把具有海量数据的数据文件拆分成多个文件

语法：clustered by (字段名) into 桶数 buckets;

导入数据：查询HDFS页面数据效果：多个文件

hsahcode % 桶数=？相同的放在同一个桶

SerDe机制

什么是SerDe机制

序列化：对象或数据转换为字节码

反序列化：字节码转换为对象或数据

作用

在hive中，支持的复杂数据类型有：

数据类型	表示	定义类型	示例
array	数组	array	[1,2,3]
struct	结构	struct<子列名类型，子列名类型，……>	{“张三”，18}
map	map集合	map<key类型，value类型>	{“name”:”zhangshan”, “age”:18}

3. 使用

字段之间分割符：row format delimited fields terminated by “,”;

集合/数组之间分割符：collection items terminated by “,”;

map映射K-V之间分割符：map keys terminated by “,”;

行数据之间分隔符：lines terminated by “,”;

Hive使用SerDe（和FileFormat）读取和写入行对象

# read文件
hdfs files --> InputFilesFormat --><key,value> --> deserializer --> Row object;

# 写文件
Row object --> Serializer --> <key,value> --> OutputFileFormat -->HDFS files

复杂类型

创建表

-- 创建表
create table tset01(
	name string,
        citys array<string>
)
row format delimited
fields terminated by "\t"
collection items terminated by ","
map keys terminated by ","
lines terminated by ",";

array数组

create table complex_array(
	name string,
    city_array array<string>
)
row format delimited
fields terminated by "\t"   --字段间分割
collection items terminated by ",";

-- 导入数据
load data local inpath '/root/day09_hive/array/01-data_for_array_type.txt' into table complex_array;

select * from complex_array;

-- 获取数组数据
select *,city_array[0] from complex_array;   --获取数组中的值
select *,size(city_array) from complex_array;  --获取数组大小
select *,array_contains(city_array,"beijing") from complex_array;  --查看数组中是否包含beijing

struct结构

-- 创建表
create table complex_struct(
	name string,
    city_struct struct<name:string,age:int>
)
row format delimited
fields terminated by "#"   --字段间分割
collection items terminated by ":";

-- 查看数据
select * from complex_struct;

-- 加载数据
load data inpath '/input/02-data_for_struct_type.txt' into table complex_struct;

-- 查看数据
select *,city_struct.name,city_struct.age from complex_struct;

map集合

-- 建表
create table complex_map(
    id int,
	name string,
    homeInfo map<string,string>,
    age int
)
row format delimited
fields terminated by ","   --字段间分割
collection items terminated by "#"
map keys terminated by ":";

-- 查看数据
select * from complex_map;

/*
-- 常用函数
变量名[key]  :
map_keys[变量名] :
map_values[变量名] :
array_contains(数组名, value)
*/
select name,homeInfo["father"],homeInfo["mother"] from complex_map;
select map_keys(homeInfo),map_values(homeInfo) from complex_map;
select * from complex_map where array_contains(map_keys(homeInfo),"sister");

查询

基础查询

简单查询

1 2	-- 去除重复：distinct、group by、partition by -- 运算操作：范围、比较运算、逻辑运算、，模糊查询、非空

聚合查询
1
2
3
4
5
count()
avg()
max()
min()
sum()
排序查询
1
order by asc \ desc

分页查询

limit m n;
-- 每页显示10条，第五页
-- limit (n-1)*10  10
limit 40 10;

分组查询

-- sql执行顺序
from --> group by --> select ---> having

-- having 和 where的异同点
相同点：都可以进行过滤数据
不同点：where的执行优先于having
		having 只能和group by配合使用
		having 比select优先级还低，所以可以使用聚合函数