首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >HDFS 异构存储

HDFS 异构存储

原创
作者头像
liubang01
修改2017-07-28 11:42:29
4K0
修改2017-07-28 11:42:29
举报

需求

Hadoop 从 2.4 后开始支持异构存储,异构存储是为了解决爆炸式的存储容量增长以及计算能力增长所带来的数据存储需求,一份数据热数据在经历计算产生出新的数据,那么原始数据有可能变为冷数据,随着数据不断增长差异化存储变的非常迫切,需要经常被计算或者读取的热数据为了保证性能需要存储在高速存储设备上,当一些数据变为冷数据后不经常会用到的数据会变为归档数据,可以使用大容量性能要差一些的存储设备来存储来减少存储成本,HDFS 可以按照一定的规则来存储这些数据,具体架构如下:

存储类型&存储策略

存储类型

  • RAM_DISK 内存镜像文件系统
  • SSD SSD 盘
  • DSIK 普通磁盘
  • ARCHIVE 归档

存储策略

策略 ID

策略名称

块分布

creationFallbacks

replicationFallbacks

15

Lazy_Persist

RAM_DISK: 1, DISK: n-1

DISK

DISK

12

All_SSD

SSD: n

DISK

DISK

10

One_SSD

SSD: 1, DISK: n-1

SSD, DISK

SSD, DISK

7

Hot (default)

DISK: n

< none >

ARCHIVE

5

Warm

DISK: 1, ARCHIVE: n-1

ARCHIVE, DISK

ARCHIVE, DISK

2

Cold

ARCHIVE: n

< none >

< none >

存储策略名称分别从 Lazy_Persist 到 Clod,分别代表了设备的访问速度从快到慢,访问速度最快的为内存文件系统,其次是 SSD,再是普通盘,最后是归档性存储,我们可以利用上面的策略来控制数据的分布以达到降低成本的目的。

creationFallbacks

对于第一个创建的 block 块的 fallback 情况时的可选存储类型

replicationFallbacks

对于的 block 块的其余副本的 fallback 情况时的可选存储类型,这里出现了 fallback 的情况,什么叫做 fallback 的情况呢,当前存储类型不可用的时候,退一级所选择使用的存储类型

测试环境验证

环境信息准备

  • 数据拷贝数 2
  • datanode 节点信息

DataNode

存储介质

初始空间

HDFS 设置介质类型

100.67.57.220

SSD

100G

DISK

100.67.57.221

SSD

100G

DISK

100.67.57.222

SSD

100G

DISK

10.108.100.24

普通盘

100G

ARCHIVE

10.108.100.71

普通盘

100G

ARCHIVE

初始集群只有 220、221、222 三个存储节点,默认的存储类型没有设置即为 DISK 类型(实际盘是 SSD),24 和 71 节点为新扩节点实际磁盘为机械盘在 hdfs 里设置的存储类型为 ARCHIVE

  • 初始文件信息

bin/hadoop fs -ls / |awk '{print $8}'|xargs bin/hadoop fs -du -s -h

在 HDFS 默认策略下有一个 hot 目录下面有 1G 的文件

  • 初始块分布
[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hdfs fsck /hot  -files -blocks -locations 

0\. BP-983125464-100.67.159.132-1474351508701:blk_1073742694_1878 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.220:4028]

1\. BP-983125464-100.67.159.132-1474351508701:blk_1073742695_1879 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.221:4028]

2\. BP-983125464-100.67.159.132-1474351508701:blk_1073742696_1880 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.221:4028]

3\. BP-983125464-100.67.159.132-1474351508701:blk_1073742697_1881 len=67108864 repl=2 [100.67.57.221:4028, 100.67.57.222:4028]

4\. BP-983125464-100.67.159.132-1474351508701:blk_1073742698_1882 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.221:4028]

5\. BP-983125464-100.67.159.132-1474351508701:blk_1073742699_1883 len=67108864 repl=2 [100.67.57.221:4028, 100.67.57.222:4028]

6\. BP-983125464-100.67.159.132-1474351508701:blk_1073742700_1884 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.220:4028]

7\. BP-983125464-100.67.159.132-1474351508701:blk_1073742701_1885 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.221:4028]

8\. BP-983125464-100.67.159.132-1474351508701:blk_1073742702_1886 len=67108864 repl=2 [100.67.57.221:4028, 100.67.57.220:4028]

9\. BP-983125464-100.67.159.132-1474351508701:blk_1073742703_1887 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.221:4028]

10\. BP-983125464-100.67.159.132-1474351508701:blk_1073742704_1888 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.222:4028]

11\. BP-983125464-100.67.159.132-1474351508701:blk_1073742705_1889 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.220:4028]

12\. BP-983125464-100.67.159.132-1474351508701:blk_1073742706_1890 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.222:4028]

13\. BP-983125464-100.67.159.132-1474351508701:blk_1073742707_1891 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.220:4028]

14\. BP-983125464-100.67.159.132-1474351508701:blk_1073742708_1892 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.221:4028]

15\. BP-983125464-100.67.159.132-1474351508701:blk_1073742709_1893 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.221:4028]

可以确认 16 个块均匀的分布在 220 到 222 三个存储节点上

  • 设置不同的目录不同的策略

hot 目录是默认策略不用修改

设置 warm 目录策略为 warm

[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hdfs dfsadmin -setStoragePolicy /warm Warm

Set storage policy Warm on /warm

设置 cold 目录策略为 clod

[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hdfs dfsadmin -setStoragePolicy /cold Cold

Set storage policy Cold on /cold

此时在集群还未加入存储类型为 ARCHIVE 类型的几点的时候如果向/cold 目录写入数据会抛出异常

数据降冷 Hot 到 Warm

查看 Warm 目录的存储策略
[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hdfs dfsadmin -getStoragePolicy /warm

The storage policy of /warm:

BlockStoragePolicy{WARM:5, storageTypes=[DISK, ARCHIVE], creationFallbacks=[DISK, ARCHIVE], replicationFallbacks=[DISK, ARCHIVE]}
移动数据到 warm 目录
[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hadoop fs -ls /hot /warm 

Found 1 items

drwxr-xr-x   \- hadoop supergroup          0 2016-09-27 14:29 /warm/data
执行 mover
[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hdfs mover /warm  /hot

16/09/28 10:24:20 INFO mover.Mover: namenodes = {hdfs://HDFS40028=null}

16/09/28 10:24:21 INFO net.NetworkTopology: Adding a new node: /default-rack/100.67.57.220:4028

16/09/28 10:24:21 INFO net.NetworkTopology: Adding a new node: /default-rack/100.67.57.222:4028

16/09/28 10:24:21 INFO net.NetworkTopology: Adding a new node: /default-rack/10.108.100.24:4000

16/09/28 10:24:21 INFO net.NetworkTopology: Adding a new node: /default-rack/100.67.57.221:4028

16/09/28 10:24:21 INFO net.NetworkTopology: Adding a new node: /default-rack/10.108.100.71:4000

16/09/28 10:24:46 INFO balancer.Dispatcher: Successfully moved blk_1073742694_1878 with size=67108864 from 100.67.57.220:4028:DISK to 10.108.100.24:4000:ARCHIVE through 100.67.57.220:4028

16/09/28 10:24:50 INFO balancer.Dispatcher: Successfully moved blk_1073742703_1887 with size=67108864 from 100.67.57.220:4028:DISK to 10.108.100.71:4000:ARCHIVE through 100.67.57.220:4028

16/09/28 10:24:50 INFO balancer.Dispatcher: Successfully moved blk_1073742702_1886 with size=67108864 from 100.67.57.221:4028:DISK to 10.108.100.71:4000:ARCHIVE through 100.67.57.221:4028

16/09/28 10:24:50 INFO balancer.Dispatcher: Successfully moved blk_1073742700_1884 with size=67108864 from 100.67.57.222:4028:DISK to 10.108.100.71:4000:ARCHIVE through 100.67.57.222:4028

16/09/28 10:24:52 INFO balancer.Dispatcher: Successfully moved blk_1073742697_1881 with size=67108864 from 100.67.57.222:4028:DISK to 10.108.100.24:4000:ARCHIVE through 100.67.57.222:4028

16/09/28 10:24:52 INFO balancer.Dispatcher: Successfully moved blk_1073742701_1885 with size=67108864 from 100.67.57.221:4028:DISK to 10.108.100.71:4000:ARCHIVE through 100.67.57.221:4028

16/09/28 10:24:52 INFO balancer.Dispatcher: Successfully moved blk_1073742698_1882 with size=67108864 from 100.67.57.221:4028:DISK to 10.108.100.24:4000:ARCHIVE through 100.67.57.221:4028

16/09/28 10:24:52 INFO balancer.Dispatcher: Successfully moved blk_1073742696_1880 with size=67108864 from 100.67.57.222:4028:DISK to 10.108.100.24:4000:ARCHIVE through 100.67.57.222:4028

16/09/28 10:24:53 INFO balancer.Dispatcher: Successfully moved blk_1073742695_1879 with size=67108864 from 100.67.57.221:4028:DISK to 10.108.100.24:4000:ARCHIVE through 100.67.57.221:4028

16/09/28 10:24:53 INFO balancer.Dispatcher: Successfully moved blk_1073742699_1883 with size=67108864 from 100.67.57.221:4028:DISK to 10.108.100.71:4000:ARCHIVE through 100.67.57.221:4028

16/09/28 10:25:21 WARN hdfs.DFSClient: Slow ReadProcessor read fields took 60116ms (threshold=30000ms); ack: seqno: 1 status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 146874, targets: [100.67.57.221:4028, 100.67.57.220:4028]

Sep 28, 2016 10:25:29 AM Mover took 1mins, 8sec
检查数据块分布
bin/hdfs fsck /warm  -files -blocks -locations

0\. BP-983125464-100.67.159.132-1474351508701:blk_1073742694_1878 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

1\. BP-983125464-100.67.159.132-1474351508701:blk_1073742695_1879 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

2\. BP-983125464-100.67.159.132-1474351508701:blk_1073742696_1880 len=67108864 repl=2 [10.108.100.24:4000, 100.67.57.221:4028]

3\. BP-983125464-100.67.159.132-1474351508701:blk_1073742697_1881 len=67108864 repl=2 [100.67.57.221:4028, 10.108.100.24:4000]

4\. BP-983125464-100.67.159.132-1474351508701:blk_1073742698_1882 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

5\. BP-983125464-100.67.159.132-1474351508701:blk_1073742699_1883 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.222:4028]

6\. BP-983125464-100.67.159.132-1474351508701:blk_1073742700_1884 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.220:4028]

7\. BP-983125464-100.67.159.132-1474351508701:blk_1073742701_1885 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.71:4000]

8\. BP-983125464-100.67.159.132-1474351508701:blk_1073742702_1886 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.220:4028]

9\. BP-983125464-100.67.159.132-1474351508701:blk_1073742703_1887 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.221:4028]

10\. BP-983125464-100.67.159.132-1474351508701:blk_1073742704_1888 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.222:4028, 10.108.100.24:4000]

11\. BP-983125464-100.67.159.132-1474351508701:blk_1073742705_1889 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

12\. BP-983125464-100.67.159.132-1474351508701:blk_1073742706_1890 len=67108864 repl=2 [100.67.57.220:4028, 100.67.57.222:4028, 10.108.100.24:4000]

13\. BP-983125464-100.67.159.132-1474351508701:blk_1073742707_1891 len=67108864 repl=2 [100.67.57.222:4028, 100.67.57.220:4028, 10.108.100.24:4000]

14\. BP-983125464-100.67.159.132-1474351508701:blk_1073742708_1892 len=67108864 repl=2 [100.67.57.220:4028, 10.108.100.24:4000]

15\. BP-983125464-100.67.159.132-1474351508701:blk_1073742709_1893 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.221:4028]

可以看出数据分布已经是一半的块在 ssd,一半的块在普通盘

数据降冷 Warm 到 Clod

查看 Clod 目录存储策略

[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hdfs dfsadmin -getStoragePolicy /cold

The storage policy of /cold:

BlockStoragePolicy{COLD:2, storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}

移动数据到 cold 目录

[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hadoop fs -mv /warm/data /cold

[hadoop@100 /usr/local/service/40028/hadoop]$ 

[hadoop@100 /usr/local/service/40028/hadoop]$ 

[hadoop@100 /usr/local/service/40028/hadoop]$ bin/hadoop fs -ls /warm /cold       

Found 1 items

drwxr-xr-x   \- hadoop supergroup          0 2016-09-27 14:29 /cold/data

执行 mover

bin/hdfs mover /warm /cold

检查数据块分布

bin/hdfs fsck /cold  -files -blocks -locations

0\. BP-983125464-100.67.159.132-1474351508701:blk_1073742694_1878 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

1\. BP-983125464-100.67.159.132-1474351508701:blk_1073742695_1879 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

2\. BP-983125464-100.67.159.132-1474351508701:blk_1073742696_1880 len=67108864 repl=2 [10.108.100.24:4000, 10.108.100.71:4000]

3\. BP-983125464-100.67.159.132-1474351508701:blk_1073742697_1881 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

4\. BP-983125464-100.67.159.132-1474351508701:blk_1073742698_1882 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

5\. BP-983125464-100.67.159.132-1474351508701:blk_1073742699_1883 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

6\. BP-983125464-100.67.159.132-1474351508701:blk_1073742700_1884 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

7\. BP-983125464-100.67.159.132-1474351508701:blk_1073742701_1885 len=67108864 repl=2 [10.108.100.24:4000, 10.108.100.71:4000]

8\. BP-983125464-100.67.159.132-1474351508701:blk_1073742702_1886 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

9\. BP-983125464-100.67.159.132-1474351508701:blk_1073742703_1887 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

10\. BP-983125464-100.67.159.132-1474351508701:blk_1073742704_1888 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

11\. BP-983125464-100.67.159.132-1474351508701:blk_1073742705_1889 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

12\. BP-983125464-100.67.159.132-1474351508701:blk_1073742706_1890 len=67108864 repl=2 [10.108.100.71:4000, 10.108.100.24:4000]

13\. BP-983125464-100.67.159.132-1474351508701:blk_1073742707_1891 len=67108864 repl=2 [10.108.100.24:4000, 10.108.100.71:4000]

14\. BP-983125464-100.67.159.132-1474351508701:blk_1073742708_1892 len=67108864 repl=2 [100.67.57.220:4028, 10.108.100.24:4000]

15\. BP-983125464-100.67.159.132-1474351508701:blk_1073742709_1893 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.221:4028]

可以看出数据已经完全分布在冷设备上

数据生热 cold 到 warm

移动数据到 hot 目录

bin/hadoop fs -mv /cold/data /warm

执行 mover

bin/hdfs mover

检查数据块分布

bin/hdfs fsck /warm  -files -blocks -locations  

0\. BP-983125464-100.67.159.132-1474351508701:blk_1073742694_1878 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.220:4028]

1\. BP-983125464-100.67.159.132-1474351508701:blk_1073742695_1879 len=67108864 repl=2 [100.67.57.220:4028, 10.108.100.24:4000]

2\. BP-983125464-100.67.159.132-1474351508701:blk_1073742696_1880 len=67108864 repl=2 [10.108.100.24:4000, 100.67.57.220:4028]

3\. BP-983125464-100.67.159.132-1474351508701:blk_1073742697_1881 len=67108864 repl=2 [100.67.57.220:4028, 10.108.100.24:4000]

4\. BP-983125464-100.67.159.132-1474351508701:blk_1073742698_1882 len=67108864 repl=2 [100.67.57.220:4028, 10.108.100.24:4000]

5\. BP-983125464-100.67.159.132-1474351508701:blk_1073742699_1883 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.222:4028]

6\. BP-983125464-100.67.159.132-1474351508701:blk_1073742700_1884 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

7\. BP-983125464-100.67.159.132-1474351508701:blk_1073742701_1885 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.71:4000]

8\. BP-983125464-100.67.159.132-1474351508701:blk_1073742702_1886 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.222:4028]

9\. BP-983125464-100.67.159.132-1474351508701:blk_1073742703_1887 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

10\. BP-983125464-100.67.159.132-1474351508701:blk_1073742704_1888 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.222:4028]

11\. BP-983125464-100.67.159.132-1474351508701:blk_1073742705_1889 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

12\. BP-983125464-100.67.159.132-1474351508701:blk_1073742706_1890 len=67108864 repl=2 [100.67.57.222:4028, 10.108.100.24:4000]

13\. BP-983125464-100.67.159.132-1474351508701:blk_1073742707_1891 len=67108864 repl=2 [10.108.100.24:4000, 100.67.57.222:4028]

14\. BP-983125464-100.67.159.132-1474351508701:blk_1073742708_1892 len=67108864 repl=2 [100.67.57.220:4028, 10.108.100.24:4000]

15\. BP-983125464-100.67.159.132-1474351508701:blk_1073742709_1893 len=67108864 repl=2 [10.108.100.71:4000, 100.67.57.221:4028]

可以看出数据块分布已经是一半在 SSD,一半在普通盘。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 需求
  • 存储类型&存储策略
    • 存储类型
      • 存储策略
      • 测试环境验证
        • 环境信息准备
          • 数据降冷 Hot 到 Warm
            • 查看 Warm 目录的存储策略
            • 移动数据到 warm 目录
            • 执行 mover
            • 检查数据块分布
        • 数据降冷 Warm 到 Clod
          • 查看 Clod 目录存储策略
            • 移动数据到 cold 目录
              • 执行 mover
                • 检查数据块分布
                • 数据生热 cold 到 warm
                  • 移动数据到 hot 目录
                    • 执行 mover
                      • 检查数据块分布
                      相关产品与服务
                      对象存储
                      对象存储(Cloud Object Storage,COS)是由腾讯云推出的无目录层次结构、无数据格式限制,可容纳海量数据且支持 HTTP/HTTPS 协议访问的分布式存储服务。腾讯云 COS 的存储桶空间无容量上限,无需分区管理,适用于 CDN 数据分发、数据万象处理或大数据计算与分析的数据湖等多种场景。
                      领券
                      问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档