As many know, there is such an OS: FreeBSD. Bad or good, it does not matter, it is beyond the scope of this questionnaire. For those who like to write something like “FreeBSD - RIP”, please take a walk on link and leave this inscription there.
Also, there is such a file system called ZFS, the development of recently eaten Sun Microsystems. The file system is extremely interesting and quite remarkable.
I am the system administrator of HabraHabr and soon I plan a rather serious server economy upgrade. Among the ideas I have an idea to use ZFS. I recently started testing ZFS on FreeBSD 8.1-RELEASE. The flight is normal, the kernel of panic was never, the speed satisfies. But the Internet is very different reviews, sometimes just inadequate. The level of abstraction of the file system is simply amazing; you can steer sections as you like on the fly; The speed is good, in some places faster than UFS2 + SU, and it is also very easy to deploy. Pleases from the box compression sections, snapshots and other utility. I picked it up on my test server: everything works fine, I did not notice any problems.
But still, I want to know the opinions of those who directly encountered ZFS deployment on the battle server running FreeBSD and used such a bundle under real load for quite a long time. Synthetic tests are also interesting, but to a lesser extent, for such synthetic ones are synthetic. Yes: I use only stable OS assemblies, the survey is more relevant to them.
Answers
[email protected]:/usr/local/etc (1768) zpool status pool: storage state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/storage0 ONLINE 0 0 0 gpt/storage3 ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/storage1 ONLINE 0 0 0 gpt/storage2 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/system0 ONLINE 0 0 0 gpt/system2 ONLINE 0 0 0 gpt/system3 ONLINE 0 0 0 gpt/system1 ONLINE 0 0 0 errors: No known data errors
The disks are broken like this (2 MB indented at the beginning of the disk to fix the AdvancedFormat problems on WD EARS screws):
[email protected]:/usr/local/etc (1771) gpart show => 34 3907029101 ada0 GPT (1.8T) 34 2014 - free - (1.0M) 2048 128 1 freebsd-boot (64K) 2176 8388608 2 freebsd-swap (4.0G) 8390784 41943040 3 freebsd-zfs (20G) 50333824 3856695311 4 freebsd-zfs (1.8T) => 34 3907029101 ada1 GPT (1.8T) 34 2014 - free - (1.0M) 2048 128 1 freebsd-boot (64K) 2176 8388608 2 freebsd-swap (4.0G) 8390784 41943040 3 freebsd-zfs (20G) 50333824 3856695311 4 freebsd-zfs (1.8T) => 34 3907029101 ada2 GPT (1.8T) 34 2014 - free - (1.0M) 2048 128 1 freebsd-boot (64K) 2176 8388608 2 freebsd-swap (4.0G) 8390784 41943040 3 freebsd-zfs (20G) 50333824 3856695311 4 freebsd-zfs (1.8T) => 34 3907029101 ada3 GPT (1.8T) 34 2014 - free - (1.0M) 2048 128 1 freebsd-boot (64K) 2176 8388608 2 freebsd-swap (4.0G) 8390784 41943040 3 freebsd-zfs (20G) 50333824 3856695311 4 freebsd-zfs (1.8T)
Problem: ZFS RAID10 has low read and write speed:
For example, entry:
dd if=/dev/zero of=/storage/test.file bs=1000M count 1+0 records in 1+0 records out 1048576000 bytes transferred in 33.316996 secs (31472705 bytes/sec)
Or reading:
dd if=/storage/test.file of=/dev/nulbs=1000M count=1 1+0 records in 1+0 records out 1048576000 bytes transferred in 13.424865 secs (78107005 bytes/sec)
systat looks like this:
2 users Load 0,29 0,12 0,04 19 окт 14:27 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 1048432 7548 2771456 11732 87616 count All 1232436 10608 1076589k 29964 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt cow 4770 total 69 8556 20k 517 776 798 20k 20581 zfod 104 em0 uhci0 2 ozfod 5 uhci3 ehci 9,7%Sys 0,0%Intr 0,0%User 0,0%Nice 90,3%Idle %ozfod 1997 cpu0: time | | | | | | | | | | | daefr hdac0 257 ===== prcfr 667 ahci0 259 dtbuf 3762 totfr 1997 cpu1: time Namei Name-cache Dir-cache 100000 desvn react Calls hits % hits % 26371 numvn pdwak 2 2 100 24996 frevn pdpgs intrn Disks ada0 ada1 ada2 ada3 da0 pass0 pass1 429056 wire KB/t 128 128 128 127 0,00 0,00 0,00 1103516 act tps 156 173 188 145 0 0 0 368484 inact MB/s 19,51 21,62 23,48 18,03 0,00 0,00 0,00 cache %busy 18 35 35 16 0 0 0 87616 free buf
And from the disks themselves reads quite acceptable:
1073741824 bytes transferred in 9.673196 secs (111001764 bytes/sec) [email protected]:/usr/home/dyr (1769) dd if=/dev/gpt/storage1 of=/dev/null bs=1024M count=1 1+0 records in 1+0 records out 1073741824 bytes transferred in 9.887180 secs (108599400 bytes/sec) [email protected]:/usr/home/dyr (1770) dd if=/dev/gpt/storage2 of=/dev/null bs=1024M count=1 1+0 records in 1+0 records out 1073741824 bytes transferred in 9.736273 secs (110282635 bytes/sec) [email protected]:/usr/home/dyr (1772) dd if=/dev/gpt/storage3 of=/dev/null bs=1024M count=1 1+0 records in 1+0 records out 1073741824 bytes transferred in 11.112231 secs (96627025 bytes/sec)
What is the reason I do not understand.
vfs.zfs.l2c_only_size: 3535428608 vfs.zfs.mfu_ghost_data_lsize: 23331328 vfs.zfs.mfu_ghost_metadata_lsize: 20963840 vfs.zfs.mfu_ghost_size: 44295168 vfs.zfs.mfu_data_lsize: 0 vfs.zfs.mfu_metadata_lsize: 0 vfs.zfs.mfu_size: 11698176 vfs.zfs.mru_ghost_data_lsize: 22306304 vfs.zfs.mru_ghost_metadata_lsize: 8190464 vfs.zfs.mru_ghost_size: 30496768 vfs.zfs.mru_data_lsize: 512 vfs.zfs.mru_metadata_lsize: 0 vfs.zfs.mru_size: 20443648 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 1048576 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 0 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.arc_meta_limit: 106137600 vfs.zfs.arc_meta_used: 104179208 vfs.zfs.mdcomp_disable: 0 vfs.zfs.arc_min: 53068800 vfs.zfs.arc_max: 424550400 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.check_hostid: 1 vfs.zfs.recover: 0 vfs.zfs.txg.write_limit_override: 0 vfs.zfs.txg.synctime: 5 vfs.zfs.txg.timeout: 10 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 10 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.version.zpl: 4 vfs.zfs.version.spa: 15 vfs.zfs.version.dmu_backup_stream: 1 vfs.zfs.version.dmu_backup_header: 2 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0
The presence of the sharesmb and sharenfs file system options is a bit annoying - it's clear what they do on Solaris, but in FreeBSD, as I understand it, they just don't work.
Try
capacity operations bandwidth pool used avail read write read write ------------------- ----- ----- ----- ----- ----- ----- storage 652G 2,93T 505 20 61,3M 117K mirror 504G 1,29T 504 6 61,3M 39,1K gpt/storage0 - - 495 5 61,5M 39,5K gpt/storage3 - - 495 6 61,5M 39,5K mirror 148G 1,64T 0 13 1023 78,2K gpt/storage1 - - 0 10 0 78,6K gpt/storage2 - - 0 10 0 78,6K ------------------- ----- ----- ----- ----- ----- ------ gary bunker
vm.kmem_size = "999M"
vm.kmem_size_max = "999M"
vfs.zfs.arc_max = "160M"
vfs.zfs.prefetch_disable = 1
kern.ipc.nsfbufs = 10240 - ryon
# dd if=/storage/test.file of=/dev/null bs=1000M count=1 1+0 records in 1+0 records out 1048576000 bytes transferred in 13.490830 secs (77725092 bytes/sec)
Make kern.ipc.nsfbufs = 10240 failed neither from /etc/sysctl.conf, nor from /boot/loader.conf, the current value is "0". - oren
read hardforum.com/showthread.php?t=1546137 - venita
This forum read. I'm surprised, by the way, that the author of that topic did not try analog RAID10, because I consider it the most convenient for 4-screw configurations for a number of parameters.
Moreover, I additionally tested the server, adding 1 GB of memory to it (that is, a total of 3 GB) and replacing the processor in it with a Core2Quad.
Here is link I published as tested and test results.
Moreover, I tried to test the Seagate 7200.10 on a desktop server with Core2Duo and 2GB of memory from a LiveCD OpenIndiana, and the speed was also quite low, around 40MB / s. - richard ellis
But this is foronyx. And the test, as usual, has little to do with reality)
In general, I have one server with backup, backup, costs with zfs. Out of curiosity it is completely unnecessary there. Well, it seems to work, what else to say, there is no special load there, there are no power offs, frya does not hang.
It seems to me that if you really need the advantages that zfs provides, then it should be used. If they are not very necessary, then it is not worth it. it is still experimental.
The fact of the matter is that it seems to be completely not experimental, as the developers say. ZFSv14 is fully claimed as a production solution. - ginbquik
The load is small (for that he backups), problems zero.
And how reliable is it in terms of departure, say 2 disks at the same time? - alana himber
In Solaris, everything is great, and ZFS can be trusted, but will it still be so great to happen in fryahe?
Backup is of course our everything, but first of all, the viability of the system in case of failure of one of the disks is of interest.
raidz2 of 6 disks + disk for the system - mainly file storage, torrent storage and via ftp backup server raid 1 with loading on ZFS - nginx + php + mysql ZFS single disk - nginx + php + mysql
When copying a large number of small files ~ 30G within one pool, the system starts to blunt.
But here one has to put up with either a high IO speed or temporary brakes with vfs.zfs.arc overflow…
So unless, to boys to be measured: “it does in half-kick!” :)
And the boys are surrounded by those that AD set up on their Windows with manuals. : ( - japdo
But honestly, in the real world, it never took to “shrink the pool.” Expand - yes, often. And in the opposite direction - well, in a hypothetically invented situation, it can and should be. But even then it is easier to attach “smaller screws” and throw a snapshot on them via zfs send | zfs recv - celica jones
Why reduce? For example, on LVM there was an array of 500 + 750 + 750. 750 began to crumble, but there are no 750s, but there are only 500, we connect 500, pvmove with 750, we disconnect, the array is reduced by this 750.
How can I handle this situation on ZFS? Or already to go begging and spend a thousand or another dollars on a normal home file server? - emerson probst
This killer feature is still in ZFS.
As the administrator of a large project, it would be a rule to introduce into the enterprise only well-tested and tested technologies, which you are well-versed in and about which you can have a weighted opinion.
And the approach - “but advise me how different things to me here, and I will zayuzayu them on the hot” - this is a dead end.
ZFS on x86 - so far too green for enterprise production. Do you want ZFS - take the Solarium to T2 - for multi-threaded web business - the most it. Well, it's clear that you need to know Solaris))).
www.unixconsult.org/zfs_vs_lvm.html - here they are quite even compared. - amanda callendrier
Is ZFS really not able to shrink the pool? LVM does this in half a ping through pvmove. :(
Thank.
Because if there is a mirror or raidz, then it will still be used by 500 on each disk (minimum in the pool). In this case, there are no problems at all. Zpool replace bububu
And the heap in ZFS for the dough is blinded, yes, "in a heap of everything":
ad1: 76319MB <Seagate ST380011A 3.06> at ata0-slave UDMA100
ad2: 117245MB <Maxtor 6Y120L0 YAR41BW0> at ata1-master UDMA133
ad3: 152627MB <WDC WD1600JB-00REA0 20.00K20> at ata1-slave UDMA100
And it turned out:
reserve# zfs list pool
NAME USED AVAIL REFER MOUNTPOINT
pool 388M 332G 388M /pool
At the collection of ports and sortsy immediately cut a compression. Already starting to go positively.
Still, it turns out that for the time being let it stay that way, and the first investment of home is five or six large identical disks for raidz3, which is promised in v28 FreeBSD. In the meantime, I try a full-time 8.2 v14 out of the box. - jessikitty
Create several files (can be of different sizes, in the proportions that are interesting).
And do zpool _ of these files_ :) zpool create testpool / path / to / file10G / path / to / file20G / path / to / file20G
Well, try to “unhook the disk” from the pool, “hook back”, “offline”, back “online”, make replace one to another. And look at the practice, what you can do, what you will not allow. Do not break anything, info 100% :) And no screws are needed at all.
I just now have these screws for the test. Now I’ll break the remaining space in the system disk, attach more screws, all to different filesystems, for example, and begin a total distortion with recording all actions in a notebook.
Incredibly interesting thing this ZFS, by the way. - dylan quarles
I have a pool of approximately 16 terabytes in which thousands and three and a half of file systems have been sliced :) - kathy
Now I really want to raise the same pool by about the same terabyte on modern hardware. This is just a bomb! In fact, as promised in the article, partition management has become as easy as managing directories. And why did I invent geom concat and lvm before, when zfs in the fra is from the 7th branch? .. - christa hogan
copy on write, atomic metadata, blablabla :)
& gt; Compression ratio on ports + sorts 2.39x
and there, in principle, there are still pens for which you can twist :)
& gt; partition management is as easy as managing directories.
yes it is, I have on ftp every user on the file system, instead of a directory :) do you need ftpd quotas when you can cut it right on the file system? :)
and on Solaris, zfs {sharenfs, sharesmb, shareiscsi} still rely on this whole web, which is not available on freebsd :) - rosie frascella
About pens - my head is spinning on the number of pens in the manual, honestly. :))
About the management of partitions and each for FS - this is a consequence of the ideology of zfs, right? :) Some kind of very free ideology turns out, a couple of teams decide everything, looking for a catch, damn it. :))
I already read about zfs {sharenfs, sharesmb, shareiscsi}. And now I raise all three ports, damn it. :)))) - uht
Nothing :) Suddenly :)
& gt; this is a consequence of the ideology of zfs, right?
Right
& gt; And now I lift all three ports, damn it.
And will it work on bsd? In my opinion, these commands in the port on freebsd are simply not implemented. Well, maybe sharenfs except. Although I haven’t looked for a long time what they ported there - it hadn’t been there before. - elizabeth nguyen
If in doubt, you can force zpool scrub mysuperpuperpool to be forced. - alison gettler
In the meantime, drive in virtual U3. - regina ligon
Well, the very installation of the minimum OS (something like the Base System on FreeBSD, about 200 meters without too much trash) will also require a certain amount of perseverance and a desire to "figure it out." Well, or as an option - OpenSolaris in the default installation, but this is 4 gigabytes "all in a row" along with X and other tripe.
Two disks sent me a long time ago: 11/06 Solaris 10 Operating System. One, of course, for the Spark, but the second dividyuk for these our x86. It? - lauren hough
The file will be called sol-10-u8-ga-x86-dvd.iso (md5 9df7fd02d82976fd3ec38a18d1a57335). Take about link. - katherine williams
Now I will ask friends with thick Internet zafetchit image. :) - tamra king
thanks
Of the minuses, we need knowledge and resources for tuning for high performance. - jennifer phelps