avicom의 신변잡기

fastdfs 본문

LiNux / sTorAge

fastdfs

avicom 2010. 12. 20. 11:53

개요

단순한 구조의 분산형 저장 시스템(파일시스템이 아니라...저장시스템... 물론 개발자는 파일시스템이라고 주장하지만 내가 보기엔 그냥 저장 시스템이다)으로 다수의 tracker (메타 데이터 서버)와 다수의 storage 서버로 구성되어있다. 구성에 따라 raid0, raid1, raid10을 모두 구현할 수 있으며 store/retreive 속도도 빠른 편임.  php와 java binding을 지원하며 upload/download 형식으로 파일을 엑세스한다. 

구성

tracker

파일 읽기/쓰기 시에 스케쥴링, 로드밸런싱을 제어하며 key-value 형식으로 해당 파일에 대한 메타데이터를 유지한다. 

storage

파일이 실제 저장되는 서버. 클라이언트에서 올린 파일은 uniqe한 파일이름으로 rename되어 메타데이터와 함께 저장된다. 




설치 

tracker와 storage모두 동일한 소스를 설치하며, 용도에 따라 기동하는 데몬이 다르다. (tracker -> trackerd, storage -> storaged)

소스 다운로드 ( http://fastdfs.googlecode.com/files/FastDFS_v2.05.tar.gz )
적당한 위치에 풀어놓고 make.sh && make.sh install 실행
소스 빌드 및 설치가 완료되면 /etc/fdfs에 설정 파일이 생성된다. 용도에 따라 설정 파일을 수정한다. (tracker -> /etc/fdfs/tracker.conf, storage -> /etc/fdfs/storage.conf)
설정

tracker.conf

# is this config file disabled
# false for enabled
# true for disabled
disabled=false

# bind an address of this host
# empty for bind all addresses of this host
bind_addr=

# the tracker server port
port=22122

# connect timeout in seconds
# default value is 30s
connect_timeout=30

# network timeout in seconds
# default value is 30s
network_timeout=60

# the base path to store data and log files
base_path=/home/fdfs_tracker_store

# max concurrent connections this server supported
max_connections=256

# work thread count, should <= max_connections
# default value is 4
# since V2.00
work_threads=4

# the method of selecting group to upload files
# 0: round robin
# 1: specify group
# 2: load balance, select the max free space group to upload file
store_lookup=0

# which group to upload file
# when store_lookup set to 1, must set store_group to the group name
#store_group=group2
store_group=

# which storage server to upload file
# 0: round robin (default)
# 1: the first server order by ip address
# 2: the first server order by priority (the minimal)
store_server=2

# which path(means disk or mount point) of the storage server to upload file
# 0: round robin
# 2: load balance, select the max free space path to upload file
store_path=2

# which storage server to download file
# 0: round robin (default)
# 1: the source storage server which the current file uploaded to
download_server=0

# reserved storage space for system or other applications.
# if the free(available) space of any stoarge server in
# a group <= reserved_storage_space,
# no file can be uploaded to this group.
# bytes unit can be one of follows:
### G or g for gigabyte(GB)
### M or m for megabyte(MB)
### K or k for kilobyte(KB)
### no unit for byte(B)
reserved_storage_space = 4GB

#standard log level as syslog, case insensitive, value list:
### emerg for emergency
### alert
### crit for critical
### error
### warn for warning
### notice
### info
### debug
log_level=debug

#unix group name to run this program,
#not set (empty) means run by the group of current user
run_by_group=

#unix username to run this program,
#not set (empty) means run by current user
run_by_user=

# allow_hosts can ocur more than once, host can be hostname or ip address,
# "*" means match all ip addresses, can use range like this: 10.0.1.[1-15,20] or
# host[01-08,20-25].domain.com, for example:
# allow_hosts=10.0.1.[1-15,20]
# allow_hosts=host[01-08,20-25].domain.com
allow_hosts=*

# sync log buff to disk every interval seconds
# default value is 10 seconds
sync_log_buff_interval = 10

# check storage server alive interval seconds
check_active_interval = 120

# thread stack size, should >= 64KB
# default value is 64KB
thread_stack_size = 64KB

# auto adjust when the ip address of the storage server changed
# default value is true
storage_ip_changed_auto_adjust = true

# storage sync file max delay seconds
# default value is 86400 seconds (one day)
# since V2.00
storage_sync_file_max_delay = 86400

# the max time of storage sync a file
# default value is 300 seconds
# since V2.00
storage_sync_file_max_time = 300

# HTTP settings
http.disabled=false

# HTTP port on this tracker server
http.server_port=8080

# check storage HTTP server alive interval seconds
# <= 0 for never check
# default value is 30
http.check_alive_interval=30

# check storage HTTP server alive type, values are:
#   tcp : connect to the storge server with HTTP port only,
#        do not request and get response
#   http: storage check alive url must return http status 200
# default value is tcp
http.check_alive_type=tcp

# check storage HTTP server alive uri/url
# NOTE: storage embed HTTP server support uri: /status.html
http.check_alive_uri=/status.html

# if need find content type from file extension name
http.need_find_content_type=true

#use "#include" directive to include http other settings
##include http.conf


storage.conf

# is this config file disabled
# false for enabled
# true for disabled
disabled=false

# the name of the group this storage server belongs to
group_name=group3

# bind an address of this host
# empty for bind all addresses of this host
bind_addr=

# if bind an address of this host when connect to other servers
# (this storage server as a client)
# true for binding the address configed by above parameter: "bind_addr"
# false for binding any address of this host
client_bind=true

# the storage server port
port=23000

# connect timeout in seconds
# default value is 30s
connect_timeout=30

# network timeout in seconds
# default value is 30s
network_timeout=60

# heart beat interval in seconds
heart_beat_interval=30

# disk usage report interval in seconds
stat_report_interval=60

# the base path to store data and log files
base_path=/home/fdfs

# max concurrent connections server supported
# max_connections worker threads start when this service startup
max_connections=256

# the buff size to recv / send data
# default value is 64KB
# since V2.00
buff_size = 256KB

# work thread count, should <= max_connections
# work thread deal network io
# default value is 4
# since V2.00
work_threads=4

# if disk read / write separated
##  false for mixed read and write
##  true for separated read and write
# default value is true
# since V2.00
disk_rw_separated = true

# disk reader thread count per store base path
# for mixed read / write, this parameter can be 0
# default value is 1
# since V2.00
disk_reader_threads = 1

# disk writer thread count per store base path
# for mixed read / write, this parameter can be 0
# default value is 1
# since V2.00
disk_writer_threads = 1

# when no entry to sync, try read binlog again after X milliseconds
# 0 for try again immediately (not need to wait)
sync_wait_msec=200

# after sync a file, usleep milliseconds
# 0 for sync successively (never call usleep)
sync_interval=0

# sync start time of a day, time format: Hour:Minute
# Hour from 0 to 23, Minute from 0 to 59
sync_start_time=00:00

# sync end time of a day, time format: Hour:Minute
# Hour from 0 to 23, Minute from 0 to 59
sync_end_time=23:59

# write to the mark file after sync N files
# default value is 500
write_mark_file_freq=500

# path(disk or mount point) count, default value is 1
store_path_count=1

# store_path#, based 0, if store_path0 not exists, it's value is base_path
# the paths must be exist
store_path0=/home/fdfs
#store_path1=/home/yuqing/fastdfs2

# subdir_count  * subdir_count directories will be auto created under each
# store_path (disk), value can be 1 to 256, default value is 256
subdir_count_per_path=256

# tracker_server can ocur more than once, and tracker_server format is
#  "host:port", host can be hostname or ip address
#tracker_server=192.168.111.81:22122
tracker_server=192.168.111.85:22122

#standard log level as syslog, case insensitive, value list:
### emerg for emergency
### alert
### crit for critical
### error
### warn for warning
### notice
### info
### debug
log_level=debug

#unix group name to run this program,
#not set (empty) means run by the group of current user
run_by_group=

#unix username to run this program,
#not set (empty) means run by current user
run_by_user=

# allow_hosts can ocur more than once, host can be hostname or ip address,
# "*" means match all ip addresses, can use range like this: 10.0.1.[1-15,20] or
# host[01-08,20-25].domain.com, for example:
# allow_hosts=10.0.1.[1-15,20]
# allow_hosts=host[01-08,20-25].domain.com
allow_hosts=*

# the mode of the files distributed to the data path
# 0: round robin(default)
# 1: random, distributted by hash code
file_distribute_path_mode=1

# valid when file_distribute_to_path is set to 0 (round robin),
# when the written file count reaches this number, then rotate to next path
# default value is 100
file_distribute_rotate_count=100

# call fsync to disk when write big file
# 0: never call fsync
# other: call fsync when written bytes >= this bytes
# default value is 0 (never call fsync)
fsync_after_written_bytes=0

# sync log buff to disk every interval seconds
# default value is 10 seconds
sync_log_buff_interval=10

# sync binlog buff / cache to disk every interval seconds
# this parameter is valid when write_to_binlog set to 1
# default value is 60 seconds
sync_binlog_buff_interval=60

# sync storage stat info to disk every interval seconds
# default value is 300 seconds
sync_stat_file_interval=300

# thread stack size, should >= 512KB
# default value is 512KB
thread_stack_size=512KB

# the priority as a source server for uploading file.
# the lower this value, the higher its uploading priority.
# default value is 10
upload_priority=10

# the NIC alias prefix, such as eth in Linux, you can see it by ifconfig -a
# multi aliases split by comma. empty value means auto set by OS type
# default values is empty
if_alias_prefix=

# if check file duplicate, when set to true, use FastDHT to store file indexes
# 1 or yes: need check
# 0 or no: do not check
# default value is 0
check_file_duplicate=0

# namespace for storing file indexes (key-value pairs)
# this item must be set when check_file_duplicate is true / on
key_namespace=FastDFS

# set keep_alive to 1 to enable persistent connection with FastDHT servers
# default value is 0 (short connection)
keep_alive=0

# you can use "#include filename" (not include double quotes) directive to
# load FastDHT server list, when the filename is a relative path such as
# pure filename, the base path is the base path of current/this config file.
# must set FastDHT server list when check_file_duplicate is true / on
# please see INSTALL of FastDHT for detail
##include /home/yuqing/fastdht/conf/fdht_servers.conf


#HTTP settings
http.disabled=false

# use the ip address of this storage server if domain_name is empty,
# else this domain name will ocur in the url redirected by the tracker server
http.domain_name=192.168.111.83

# the port of the web server on this storage server
http.server_port=8888

http.trunk_size=256KB

# if need find content type from file extension name
http.need_find_content_type=true

#use "#include" directive to include HTTP other settings
##include http.conf

duplication / load balance 설정

storage.conf 설정 항목 중에 group_name이라는 항목이 있다. fastdfs에서 볼륨 구분을 바로 이 group_name으로 하는데, 같은 group_name인 storage 서버들 끼리는 파일 동기화가 이루어지고, 서로 다른 group_name들끼리는 분산이 이루어진다. 
위의 그림처럼, group_name=1인 volume 서버군들은 모두 같은 파일 내용을 담고 있으며 volume 1 ~ volume n서버끼지를 서로 다른 파일이 tracker에 의해 분산되어 저장된다. heart beat check time도 꽤 빠른 편이라 storage서버 중 하나가 offline이 되면 1초 이내로 감지하고 해당 서버에 대한 엑세스를 disable시킨다. 


php binding 

php와 jave에 대한 binding을 제공하는데, 여기선 php 만 설명한다. 
fastdfs 소스를 받아서 풀면 php_client라는 디렉토리가 있다. 여기에 들어가서 

phpize
configure --with-php-config=path/to/php-config
make && make install
cp fastdfs_client.ini /etc/php.d/.  # php를 소스 컴파일해서 설치했을 경우 php.ini에 직접                        
                                                          fastdfs_client.ini의 내용을 넣어주던지 해야함
[root@test5 etc]# php -i |grep -i fastdfs
/etc/php.d/fastdfs_client.ini,
fastdfs_client
fastdfs_client support => enabled
OLDPWD => /root/src/FastDFS/php_client
_SERVER["OLDPWD"] => /root/src/FastDFS/php_client
_ENV["OLDPWD"] => /root/src/FastDFS/php_client

기동

역할에 따라 기동하는 데몬이 다르다. 설치하게 되면 기동 스크립트가 /etc/init.d 에 생성된다. 

tracker

service fdfs_trackerd start
ps -ef |grep tracker
root      2530  2499  0 15:37 pts/1    00:00:00 grep tracker
root     27484     1  0 Dec17 ?        00:00:00 /usr/local/bin/fdfs_trackerd /etc/fdfs/tracker.conf

storage

service fdfs_storaged start
ps -ef |grep storage
root      2586  2556  0 15:36 pts/0    00:00:00 grep storage
root     27598     1  0 Dec17 ?        00:00:00 /usr/local/bin/fdfs_storaged /etc/fdfs/storage.conf

Test

upload

[root@test5 walkholic]# dd if=/dev/zero of=./testfile.dat bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.191039 seconds, 549 MB/s

[root@test5 walkholic]# /usr/local/bin/fdfs_test /etc/fdfs/storage.conf upload testfile.dat
This is FastDFS client test program v2.05

Copyright (C) 2008, Happy Fish / YuQing

FastDFS may be copied only under the terms of the GNU General
Public License V3, which may be found in the FastDFS source kit.
Please visit the FastDFS Home Page http://www.csource.org/
for more detail.

[2010-12-20 15:42:10] INFO - base_path=/home/fdfs, connect_timeout=30, network_timeout=60, tracker_server_count=1, anti_steal_token=0, anti_steal_secret_key length=0

tracker_query_storage_store_list_without_group:
        server 1. group_name=group2, ip_addr=192.168.111.82, port=23000

group_name=group3, ip_addr=192.168.111.83, port=23000
storage_upload_by_filename
group_name=group3, remote_filename=M00/23/43/wKhvU00O-sMAAAAABkAAAEWMa5c267.dat
source ip address: 192.168.111.83
file timestamp=2010-12-20 15:42:11
file size=104857600
file crc32=1166830487
file url: http://192.168.111.85/group3/M00/23/43/wKhvU00O-sMAAAAABkAAAEWMa5c267.dat
storage_upload_slave_by_filename
group_name=group3, remote_filename=M00/23/43/wKhvU00O-sMAAAAABkAAAEWMa5c267_big.dat
[2010-12-20 15:42:11] ERROR - recv data from storage server 192.168.111.83:23000 fail, recv bytes: 16 != 24
source ip address: 192.168.111.83
file timestamp=2010-12-20 15:42:11
file size=104857600
file crc32=4286513152
file url: http://192.168.111.85/group3/M00/23/43/wKhvU00O-sMAAAAABkAAAEWMa5c267_big.dat


download

[root@test5 walkholic]# /usr/local/bin/fdfs_test /etc/fdfs/storage.conf download group3 M00/23/43/wKhvU00O-sMAAAAABkAAAEWMa5c267_big.dat
This is FastDFS client test program v2.05

Copyright (C) 2008, Happy Fish / YuQing

FastDFS may be copied only under the terms of the GNU General
Public License V3, which may be found in the FastDFS source kit.
Please visit the FastDFS Home Page http://www.csource.org/
for more detail.

[2010-12-20 15:42:43] INFO - base_path=/home/fdfs, connect_timeout=30, network_timeout=60, tracker_server_count=1, anti_steal_token=0, anti_steal_secret_key length=0

storage=192.168.111.83:23000
download file success, file size=104857600, file save to wKhvU00O-sMAAAAABkAAAEWMa5c267_big.dat

[root@test5 walkholic]# ls -al wKhvU00O-sMAAAAABkAAAEWMa5c267_big.dat
-rw-r--r-- 1 root root 104857600 Dec 20 15:42 wKhvU00O-sMAAAAABkAAAEWMa5c267_big.dat