Seedbank: Difference between revisions
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
* Part of: [[Part Of::Earthseed]] | * Part of: [[Part Of::Earthseed]] | ||
* [[ | * [[Mirror DANDI]] | ||
* [[ | * [[Calcium Imaging Analysis]] | ||
== Hardware == | == Hardware == |
Revision as of 00:01, 11 January 2024
Or, more specifically, the Storinator Q30 enhanced.
Also known as
seedbank
and boy is she a mysterious one
Related Projects
Hardware
- 30 16.38TB hard drives (1 used for OS, so 29)
Web Administration (IPMI)
There are two subsystems on the NAS, the IPMI system and the main operating system. The IPMI system can be used to configure the system before an OS is present and manage other administration tasks.
- Log into the web console through its IP (currently 192.168.1.28), but check the DHCP server
- The default creds are
- Username: ADMIN
- Password: (on side of server)
Install Debian
See: https://knowledgebase.45drives.com/kb/kb450289-ubuntu-20-04-redundant-os-installation/
- Get yourself a copy of debian
- Specifically, a full installation image
- Open the IPMI control panel (see above)
- Launch a virtual console either with the HTML5 or Java plugin
- To use java, you'll need openjdk, and since apparently this kind of java file has been deprecated you'll also need openwebstart
- Then open the
launch.jnlp
file with openwebstart (not sure how to do via CLI, right click and "open with...") - It seems like HTML5 can do everything the java version does without needing all that java shit, so might as well use that?
- The power of the "server" is different than the IPMI subsystem, so you might need to turn on the server on the Remote Control -> Power Control menu
- Wait this thing comes with Ubuntu installed... nvm for now
Config
See the ansible configuration for the seedbank
host.
Security
- Users
- Root password changed
- User password changed
- Made user
jonny
that is in sudoers
- SSH
- Root access disabled
- Password access disabled
- Firewall
- Disable all incoming connections, except LAN to port 22.
ZFS
There is so much to know about ZFS, lots of docs at bottom of page. it is late and i am just trying to get down the minimal amount needed to recreate what we gone and done.
The ansible config is as follows:
zfs_create_pools: true
zfs_create_volumes: true
zfs_enable_iscsi: false # handle iscsi separately
zfs_iscsitarget_iqn: iqn.2023-10.com.aharoni-lab.seed
zfs_iscsitarget_target_portals:
- "192.168.1.0/24"
zfs_install_update: false
zfs_pools:
- name: P2P
action: create
compression: lz4
type: raidz2
state: present
devices:
- 'scsi-35000c500db4dca84'
- 'scsi-35000c500db76a540'
- 'scsi-35000c500db76bea3'
- 'scsi-35000c500db779f0a'
- 'scsi-35000c500db77ecb8'
- 'scsi-35000c500db786ba1'
- name: P2P
action: add
compression: lz4
type: raidz2
state: present
devices:
- 'scsi-35000c500db79489a'
- 'scsi-35000c500db7a4a5e'
- 'scsi-35000c500db7aa844'
- 'scsi-35000c500db7b3ffd'
- 'scsi-35000c500db7cb67e'
- 'scsi-35000c500db7d3881'
- name: P2P
action: add
compression: lz4
type: raidz2
state: present
devices:
- 'scsi-35000c500db84c49c'
- 'scsi-35000c500db85463a'
- 'scsi-35000c500db8584e5'
- 'scsi-35000c500db867df0'
- 'scsi-35000c500db86b99d'
- 'scsi-35000c500db86cd45'
- name: P2P
action: add
compression: lz4
type: raidz2
state: present
devices:
- 'scsi-35000c500db8ccc4e'
- 'scsi-35000c500db915eb7'
- 'scsi-35000c500dba3809b'
- 'scsi-35000c500dba3cce2'
- 'scsi-35000c500dba3ea10'
- 'scsi-35000c500dba55931'
zfs_volumes:
- name: DANDI
pool: P2P
shareiscsi: on
iscsi_name: "{{ zfs_iscsitarget_iqn }}:P2P.DANDI"
volsize: 250T
lun: 1
state: present
allow:
- "192.168.1.0/24"
Which makes a
- ZFS VOLUME named DANDI which is within a...
- ZFS POOL named P2P, which is composed of...
- 4x VDRIVES, each of which has...
- 6x PHYSICAL DRIVES configured in a raidz2 configuration.
Useful commands for zfs include...
zpool status
- show what pools exist!!!!zfs get all
- show all the configuration!!!!
iSCSI
Why is this so HARD for the LOVE OF GOD
- Open port 3260
- MAKE the damn LOGICAL UNIT don't ASK ME ABOUT THSI COMMAND:
sudo tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /dev/P2P/DANDI
Which should look like THIS
>>> sudo tgtadm --lld iscsi --op show --mode target Target 1: iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI System information: Driver: iscsi State: ready I_T nexus information: LUN information: LUN: 0 Type: controller SCSI ID: IET 00010000 SCSI SN: beaf10 Size: 0 MB, Block size: 1 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: null Backing store path: None Backing store flags: LUN: 1 Type: disk SCSI ID: IET 00010001 SCSI SN: beaf11 Size: 274877907 MB, Block size: 512 Online: Yes Removable media: No Prevent removal: No Readonly: No SWP: No Thin-provisioning: No Backing store type: rdwr Backing store path: /dev/P2P/DANDI Backing store flags: Account information: ACL information: 192.168.1.0/24
- That actually should correspond to a config file in
/etc/tgt/targets.conf
default-driver iscsi
<target iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI>
backing-store /dev/P2P/DANDI
initiator-name 192.168.1.0/24
incominguser earthseed SOME_PASSWORD
</target>
Then from the CLIENT machine, using open-iscsi
, discover the shared iscsi volumes on the SERVER:
>>> sudo iscsiadm --mode discovery --portal 192.168.1.30 --type sendtargets 192.168.1.30:3260,1 iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI
Then configure your CREDS in the /etc/iscsi/iscsid.conf
file like
node.session.auth.authmethod = CHAP node.session.auth.username = earthseed node.session.auth.password = SOME_PASSWORD discovery.sendtargets.auth.authmethod = CHAP discovery.sendtargets.auth.username = earthseed discovery.sendtargets.auth.password = SOME_PASSWORD
And that should leave you with some UNLABELED DRIVE!!!!
jonny@earthseed:~$ sudo lsblk -e7 -d NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 9.1T 0 disk /mnt/dataata sdb 8:16 0 9.1T 0 disk /mnt/databta sdc 8:32 0 9.1T 0 disk /mnt/datacta sdd 8:48 0 250T 0 disk nvme0n1 259:0 0 931.5G 0 disk
Formatting the Partition
THEN once you have gone through ALL THAT SHIT you need to format the thing you just made
First make a partition table with parted
>>> sudo parted /dev/sdd GNU Parted 3.5 Using /dev/sdd Welcome to GNU Parted! Type 'help' to view a list of commands. >>> (parted) mklabel gpt >>> (parted) print Model: IET VIRTUAL-DISK (scsi) Disk /dev/sdd: 275TB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags >>> (parted) mkpart primary ext4 0% 100% >>> (parted) print Model: IET VIRTUAL-DISK (scsi) Disk /dev/sdd: 275TB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 275TB 275TB ext4 primary (parted) quit
and then FORMAT THAT BAD BOY
sudo mkfs.ext4 /dev/sdd1
Get the ID by doing ls -l /dev/disk/by-id
and find the one POINTING TO YOUR PARTITION
>>> ls -l /dev/disk/by-id
lrwxrwxrwx 1 root root 9 Oct 23 22:06 scsi-360000000000000000e00000000010001 -> ../../sdd
lrwxrwxrwx 1 root root 10 Oct 23 22:06 scsi-360000000000000000e00000000010001-part1 -> ../../sdd1
and ADD IT TO /etc/fstab
.
ID=scsi-360000000000000000e00000000010001-part1 /mnt/seedbank/p2p/dandi ext4 _netdev 0 0
Maintenance
Monitoring
Interactively
Take 10 second samples of I/O to and from individual devices:
zpool iostat -v 10
Busted Drives
Check the status of your ZFS pool with zpool status
, which might show you something like this:
jonny@seedbank:~$ zpool status pool: P2P state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. config: NAME STATE READ WRITE CKSUM P2P DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 scsi-35000c500db4dca84 FAULTED 0 22 0 too many errors scsi-35000c500db76a540 ONLINE 0 0 0 scsi-35000c500db76bea3 ONLINE 0 0 0 scsi-35000c500db779f0a ONLINE 0 0 0 scsi-35000c500db77ecb8 ONLINE 0 2 0 scsi-35000c500db786ba1 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 scsi-35000c500db79489a ONLINE 0 0 0 scsi-35000c500db7a4a5e FAULTED 0 99 0 too many errors scsi-35000c500db7aa844 ONLINE 0 0 0 scsi-35000c500db7b3ffd ONLINE 0 0 0 scsi-35000c500db7cb67e ONLINE 0 0 0 scsi-35000c500db7d3881 ONLINE 0 0 0 raidz2-2 DEGRADED 0 0 0 scsi-35000c500db84c49c ONLINE 0 0 0 scsi-35000c500db85463a FAULTED 0 36 0 too many errors scsi-35000c500db8584e5 ONLINE 0 0 0 scsi-35000c500db867df0 ONLINE 0 0 0 scsi-35000c500db86b99d FAULTED 0 82 0 too many errors scsi-35000c500db86cd45 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 scsi-35000c500db8ccc4e ONLINE 0 0 0 scsi-35000c500db915eb7 ONLINE 0 0 0 scsi-35000c500dba3809b ONLINE 0 2 0 scsi-35000c500dba3cce2 ONLINE 0 0 0 scsi-35000c500dba3ea10 ONLINE 0 0 0 scsi-35000c500dba55931 ONLINE 0 1 1 (repairing)
These can be false positives, so you want to run SMART tests to confirm that there actually are errors.
- Remind yourself how the ids map to device names:
ls -la /dev/disk/by-id/
- See SMART status:
sudo smartctl /dev/sda -a
- Run SMART test:
sudo smartctl /dev/sda -t long
That'll run in the background and take >24h to complete. Come back and check the results with smartctl
later.
Option 1: Not actually a problem
If there are no problems, it'll look like this:
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1062
If so, you can re-enable the drive, and ZFS will resilver it if needed.
sudo zpool clear <POOL_NAME> <DEVICE_ID>
eg.
sudo zpool clear P2P scsi-35000c500db7a4a5e
which should look like this:
>>> zpool status pool: P2P state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Dec 6 15:54:21 2023 2.40T scanned at 6.39G/s, 1.30T issued at 3.46G/s, 305T total 73.4G resilvered, 0.43% done, 1 days 00:57:12 to go config: NAME STATE READ WRITE CKSUM P2P ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 scsi-35000c500db4dca84 ONLINE 0 0 0 (resilvering) scsi-35000c500db76a540 ONLINE 0 0 0 scsi-35000c500db76bea3 ONLINE 0 0 0 scsi-35000c500db779f0a ONLINE 0 0 0 scsi-35000c500db77ecb8 ONLINE 0 2 0 scsi-35000c500db786ba1 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 scsi-35000c500db79489a ONLINE 0 0 0 scsi-35000c500db7a4a5e ONLINE 0 0 0 (resilvering) scsi-35000c500db7aa844 ONLINE 0 0 0 scsi-35000c500db7b3ffd ONLINE 0 0 0 scsi-35000c500db7cb67e ONLINE 0 0 0 scsi-35000c500db7d3881 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 scsi-35000c500db84c49c ONLINE 0 0 0 scsi-35000c500db85463a ONLINE 0 0 0 (resilvering) scsi-35000c500db8584e5 ONLINE 0 0 0 scsi-35000c500db867df0 ONLINE 0 0 0 scsi-35000c500db86b99d ONLINE 0 0 0 (resilvering) scsi-35000c500db86cd45 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 scsi-35000c500db8ccc4e ONLINE 0 0 0 scsi-35000c500db915eb7 ONLINE 0 0 0 scsi-35000c500dba3809b ONLINE 0 2 0 scsi-35000c500dba3cce2 ONLINE 0 0 0 scsi-35000c500dba3ea10 ONLINE 0 0 0 scsi-35000c500dba55931 ONLINE 0 1 1
Option 2: Replace the drive
Modifying Pools
Documentation
rly need to split this page into ZFS and iSCSI
Storinator
- Initial Cable Setup - note that the IPMI cable is separate from the main internet cable.
- Setting up Remote Access
- Mounting Virtual Media
- Installing Ubuntu
ZFS
- https://openzfs.github.io/openzfs-docs/man/master/8/zpool-scrub.8.html
- RAID and RAIDZ - info on ZFS and RAIDZ
- https://raidz-calculator.com/raidz-types-reference.aspx
- https://wiki.archlinux.org/title/ZFS/Virtual_disks - Archwiki on ZFS
- https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/
- https://github.com/mrlesmithjr/ansible-zfs/blob/master/tasks/manage_zfs.yml Example of using ZFS ansible commands
- https://linuxhint.com/share-zfs-volumes-via-iscsi/ - actual guide on how to do it
- https://wiki.debian.org/SAN/iSCSI/open-iscsi
- https://dannyda.com/2020/08/23/how-to-check-change-modify-zfs-syncstandard-disabled-always/
- https://somedudesays.com/2021/08/the-basic-guide-to-working-with-zfs/
- Understanding sizes, block size, sector size - https://ibug.io/blog/2023/10/zfs-block-size/
- Mirrors vs raidz - https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/
Pool Faults
- https://docs.tritondatacenter.com/private-cloud/troubleshooting/disk-replacement
- https://docs.oracle.com/cd/E19253-01/819-5461/gamno/index.html
- https://www.thomas-krenn.com/en/wiki/SMART_tests_with_smartctl
- Clearing transient errors - https://docs.oracle.com/cd/E26505_01/html/E37384/gayrd.html#gazge
Performance
- https://forums.servethehome.com/index.php?threads/very-slow-zfs-raidz2-performance-on-truenas-12.33094/
- https://forums.servethehome.com/index.php?threads/finding-my-zfs-bottleneck.19315/
iSCSI
- https://www.monperrus.net/martin/performance+of+read-write+throughput+with+iscsi
- https://www.qsan.com/data/dl_files/QSAN_Best%20Practice%20Guide_iSCSI%20Performance%20Tuning_2001_(en).pdf
- https://utcc.utoronto.ca/~cks/space/blog/tech/UnderstandingiSCSIProtocol
- https://www.ibm.com/docs/en/flashsystem-9x00/8.4.x?topic=rp-iscsi-performance-analysis-tuning
Reference
Quotations
Daniel says that the 45drives ppl said this when ordering:
I spoke with our Architect, and the Storinator Q30 configured with 2 vdevs of 15 HDDs in RAIDZ2 does have the capability to saturate a 10Gb network. I would recommend adding more resiliency by going with 3 vdevs of 10 HDDs in RAIDZ2. It will still be able to saturate a 10Gb network but will add more fault tolerance and faster resilvering times.
we shall figure out what that means...
Benchmarks
Not scientific, but just for the sake of jotting down how to measure...
Write speed
sudo dd if=/dev/zero of=/mnt/seedbank/p2p/dandi/test1.img bs=1G count=10 oflag=dsync
89.5MB/s
Read speed
time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=8k
164 MB/s
time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=256k
10.2 GB/s
time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=1M
12.9 GB/s