Seedbank

Revision as of 17:26, 12 February 2025 by Jonny (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Up to: Servers, Earthseed

Or, more specifically, the Storinator Q30 enhanced.

Also known as

seedbank

and boy is she a mysterious one

Related Projects

Hardware

  • 30 16.38TB hard drives (1 used for OS, so 29)

Web Administration (IPMI)

There are two subsystems on the NAS, the IPMI system and the main operating system. The IPMI system can be used to configure the system before an OS is present and manage other administration tasks.

  • Log into the web console through its IP (currently 192.168.1.28), but check the DHCP server
  • The default creds are
    • Username: ADMIN
    • Password: (on side of server)

Install Debian

See: https://knowledgebase.45drives.com/kb/kb450289-ubuntu-20-04-redundant-os-installation/

  • Get yourself a copy of debian
  • Open the IPMI control panel (see above)
  • Launch a virtual console either with the HTML5 or Java plugin
    • To use java, you'll need openjdk, and since apparently this kind of java file has been deprecated you'll also need openwebstart
    • Then open the launch.jnlp file with openwebstart (not sure how to do via CLI, right click and "open with...")
    • It seems like HTML5 can do everything the java version does without needing all that java shit, so might as well use that?
  • The power of the "server" is different than the IPMI subsystem, so you might need to turn on the server on the Remote Control -> Power Control menu
  • Wait this thing comes with Ubuntu installed... nvm for now

Config

See the ansible configuration for the seedbank host.

Security

  • Users
    • Root password changed
    • User password changed
    • Made user jonny that is in sudoers
  • SSH
    • Root access disabled
    • Password access disabled
  • Firewall
    • Disable all incoming connections, except LAN to port 22.

ZFS

There is so much to know about ZFS, lots of docs at bottom of page. it is late and i am just trying to get down the minimal amount needed to recreate what we gone and done.

The ansible config is as follows:

zfs_create_pools: true
zfs_create_volumes: true
zfs_enable_iscsi: false # handle iscsi separately
zfs_iscsitarget_iqn: iqn.2023-10.com.aharoni-lab.seed
zfs_iscsitarget_target_portals:
  - "192.168.1.0/24"
zfs_install_update: false

zfs_pools:
  - name: P2P
    action: create
    compression: lz4
    type: raidz2
    state: present
    devices:
      - 'scsi-35000c500db4dca84'
      - 'scsi-35000c500db76a540'
      - 'scsi-35000c500db76bea3'
      - 'scsi-35000c500db779f0a'
      - 'scsi-35000c500db77ecb8'
      - 'scsi-35000c500db786ba1'
  - name: P2P
    action: add
    compression: lz4
    type: raidz2
    state: present
    devices:
      - 'scsi-35000c500db79489a'
      - 'scsi-35000c500db7a4a5e'
      - 'scsi-35000c500db7aa844'
      - 'scsi-35000c500db7b3ffd'
      - 'scsi-35000c500db7cb67e'
      - 'scsi-35000c500db7d3881'
  - name: P2P
    action: add
    compression: lz4
    type: raidz2
    state: present
    devices:
      - 'scsi-35000c500db84c49c'
      - 'scsi-35000c500db85463a'
      - 'scsi-35000c500db8584e5'
      - 'scsi-35000c500db867df0'
      - 'scsi-35000c500db86b99d'
      - 'scsi-35000c500db86cd45'
  - name: P2P
    action: add
    compression: lz4
    type: raidz2
    state: present
    devices:
      - 'scsi-35000c500db8ccc4e'
      - 'scsi-35000c500db915eb7'
      - 'scsi-35000c500dba3809b'
      - 'scsi-35000c500dba3cce2'
      - 'scsi-35000c500dba3ea10'
      - 'scsi-35000c500dba55931'

zfs_volumes:
  - name: DANDI
    pool: P2P
    shareiscsi: on
    iscsi_name: "{{ zfs_iscsitarget_iqn }}:P2P.DANDI"
    volsize: 250T
    lun: 1
    state: present
    allow:
      - "192.168.1.0/24"

Which makes a

  • ZFS VOLUME named DANDI which is within a...
  • ZFS POOL named P2P, which is composed of...
  • 4x VDRIVES, each of which has...
  • 6x PHYSICAL DRIVES configured in a raidz2 configuration.

Useful commands for zfs include...

  • zpool status - show what pools exist!!!!
  • zfs get all - show all the configuration!!!!

iSCSI

Why is this so HARD for the LOVE OF GOD

  • Open port 3260
  • MAKE the damn LOGICAL UNIT don't ASK ME ABOUT THSI COMMAND:
sudo tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /dev/P2P/DANDI

Which should look like THIS

>>> sudo tgtadm --lld iscsi --op show --mode target
Target 1: iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
        LUN: 1
            Type: disk
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 274877907 MB, Block size: 512
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: rdwr
            Backing store path: /dev/P2P/DANDI
            Backing store flags:
    Account information:
    ACL information:
        192.168.1.0/24
  • That actually should correspond to a config file in /etc/tgt/targets.conf
default-driver iscsi

<target iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI>
        backing-store /dev/P2P/DANDI
        initiator-name 192.168.1.0/24
        incominguser earthseed SOME_PASSWORD
</target>

Then from the CLIENT machine, using open-iscsi, discover the shared iscsi volumes on the SERVER:

>>> sudo iscsiadm --mode discovery --portal 192.168.1.30 --type sendtargets
192.168.1.30:3260,1 iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI

Then configure your CREDS in the /etc/iscsi/iscsid.conf file like

node.session.auth.authmethod = CHAP
node.session.auth.username = earthseed
node.session.auth.password = SOME_PASSWORD
discovery.sendtargets.auth.authmethod = CHAP
discovery.sendtargets.auth.username = earthseed
discovery.sendtargets.auth.password = SOME_PASSWORD


And that should leave you with some UNLABELED DRIVE!!!!

jonny@earthseed:~$ sudo lsblk -e7  -d
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda       8:0    0   9.1T  0 disk /mnt/dataata
sdb       8:16   0   9.1T  0 disk /mnt/databta
sdc       8:32   0   9.1T  0 disk /mnt/datacta
sdd       8:48   0   250T  0 disk
nvme0n1 259:0    0 931.5G  0 disk


Formatting the Partition

THEN once you have gone through ALL THAT SHIT you need to format the thing you just made

First make a partition table with parted

>>> sudo parted /dev/sdd

GNU Parted 3.5
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.

>>> (parted) mklabel gpt
>>> (parted) print
Model: IET VIRTUAL-DISK (scsi)
Disk /dev/sdd: 275TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start  End  Size  File system  Name  Flags

>>> (parted) mkpart primary ext4 0% 100%
>>> (parted) print
Model: IET VIRTUAL-DISK (scsi)
Disk /dev/sdd: 275TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End    Size   File system  Name     Flags
 1      1049kB  275TB  275TB  ext4         primary

(parted) quit

and then FORMAT THAT BAD BOY

sudo mkfs.ext4 /dev/sdd1

Get the ID by doing ls -l /dev/disk/by-id and find the one POINTING TO YOUR PARTITION

>>> ls -l /dev/disk/by-id
lrwxrwxrwx 1 root root  9 Oct 23 22:06 scsi-360000000000000000e00000000010001 -> ../../sdd
lrwxrwxrwx 1 root root 10 Oct 23 22:06 scsi-360000000000000000e00000000010001-part1 -> ../../sdd1

and ADD IT TO /etc/fstab.

ID=scsi-360000000000000000e00000000010001-part1 /mnt/seedbank/p2p/dandi ext4 _netdev 0 0

Maintenance

Monitoring

Interactively

Take 10 second samples of I/O to and from individual devices:

zpool iostat -v 10

Busted Drives

Check the status of your ZFS pool with zpool status, which might show you something like this:

jonny@seedbank:~$ zpool status
  pool: P2P
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
	repaired.
config:

	NAME                        STATE     READ WRITE CKSUM
	P2P                         DEGRADED     0     0     0
	  raidz2-0                  DEGRADED     0     0     0
	    scsi-35000c500db4dca84  FAULTED      0    22     0  too many errors
	    scsi-35000c500db76a540  ONLINE       0     0     0
	    scsi-35000c500db76bea3  ONLINE       0     0     0
	    scsi-35000c500db779f0a  ONLINE       0     0     0
	    scsi-35000c500db77ecb8  ONLINE       0     2     0
	    scsi-35000c500db786ba1  ONLINE       0     0     0
	  raidz2-1                  DEGRADED     0     0     0
	    scsi-35000c500db79489a  ONLINE       0     0     0
	    scsi-35000c500db7a4a5e  FAULTED      0    99     0  too many errors
	    scsi-35000c500db7aa844  ONLINE       0     0     0
	    scsi-35000c500db7b3ffd  ONLINE       0     0     0
	    scsi-35000c500db7cb67e  ONLINE       0     0     0
	    scsi-35000c500db7d3881  ONLINE       0     0     0
	  raidz2-2                  DEGRADED     0     0     0
	    scsi-35000c500db84c49c  ONLINE       0     0     0
	    scsi-35000c500db85463a  FAULTED      0    36     0  too many errors
	    scsi-35000c500db8584e5  ONLINE       0     0     0
	    scsi-35000c500db867df0  ONLINE       0     0     0
	    scsi-35000c500db86b99d  FAULTED      0    82     0  too many errors
	    scsi-35000c500db86cd45  ONLINE       0     0     0
	  raidz2-3                  ONLINE       0     0     0
	    scsi-35000c500db8ccc4e  ONLINE       0     0     0
	    scsi-35000c500db915eb7  ONLINE       0     0     0
	    scsi-35000c500dba3809b  ONLINE       0     2     0
	    scsi-35000c500dba3cce2  ONLINE       0     0     0
	    scsi-35000c500dba3ea10  ONLINE       0     0     0
	    scsi-35000c500dba55931  ONLINE       0     1     1  (repairing)

These can be false positives, so you want to run SMART tests to confirm that there actually are errors.

  • Remind yourself how the ids map to device names: ls -la /dev/disk/by-id/
  • See SMART status: sudo smartctl /dev/sda -a
  • Run SMART test: sudo smartctl /dev/sda -t long

That'll run in the background and take >24h to complete. Come back and check the results with smartctl later.

Option 1: Not actually a problem

If there are no problems, it'll look like this:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      1062

If so, you can re-enable the drive, and ZFS will resilver it if needed.

sudo zpool clear <POOL_NAME> <DEVICE_ID>

eg.

sudo zpool clear P2P scsi-35000c500db7a4a5e

which should look like this:

>>> zpool status
  pool: P2P
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Dec  6 15:54:21 2023
	2.40T scanned at 6.39G/s, 1.30T issued at 3.46G/s, 305T total
	73.4G resilvered, 0.43% done, 1 days 00:57:12 to go
config:

	NAME                        STATE     READ WRITE CKSUM
	P2P                         ONLINE       0     0     0
	  raidz2-0                  ONLINE       0     0     0
	    scsi-35000c500db4dca84  ONLINE       0     0     0  (resilvering)
	    scsi-35000c500db76a540  ONLINE       0     0     0
	    scsi-35000c500db76bea3  ONLINE       0     0     0
	    scsi-35000c500db779f0a  ONLINE       0     0     0
	    scsi-35000c500db77ecb8  ONLINE       0     2     0
	    scsi-35000c500db786ba1  ONLINE       0     0     0
	  raidz2-1                  ONLINE       0     0     0
	    scsi-35000c500db79489a  ONLINE       0     0     0
	    scsi-35000c500db7a4a5e  ONLINE       0     0     0  (resilvering)
	    scsi-35000c500db7aa844  ONLINE       0     0     0
	    scsi-35000c500db7b3ffd  ONLINE       0     0     0
	    scsi-35000c500db7cb67e  ONLINE       0     0     0
	    scsi-35000c500db7d3881  ONLINE       0     0     0
	  raidz2-2                  ONLINE       0     0     0
	    scsi-35000c500db84c49c  ONLINE       0     0     0
	    scsi-35000c500db85463a  ONLINE       0     0     0  (resilvering)
	    scsi-35000c500db8584e5  ONLINE       0     0     0
	    scsi-35000c500db867df0  ONLINE       0     0     0
	    scsi-35000c500db86b99d  ONLINE       0     0     0  (resilvering)
	    scsi-35000c500db86cd45  ONLINE       0     0     0
	  raidz2-3                  ONLINE       0     0     0
	    scsi-35000c500db8ccc4e  ONLINE       0     0     0
	    scsi-35000c500db915eb7  ONLINE       0     0     0
	    scsi-35000c500dba3809b  ONLINE       0     2     0
	    scsi-35000c500dba3cce2  ONLINE       0     0     0
	    scsi-35000c500dba3ea10  ONLINE       0     0     0
	    scsi-35000c500dba55931  ONLINE       0     1     1

Option 2: Replace the drive

Modifying Pools

Documentation

rly need to split this page into ZFS and iSCSI

Storinator

ZFS

Pool Faults

Performance

iSCSI

Reference

Quotations

Daniel says that the 45drives ppl said this when ordering:

I spoke with our Architect, and the Storinator Q30 configured with 2 vdevs of 15 HDDs in RAIDZ2 does have the capability to saturate a 10Gb network. I would recommend adding more resiliency by going with 3 vdevs of 10 HDDs in RAIDZ2. It will still be able to saturate a 10Gb network but will add more fault tolerance and faster resilvering times.

we shall figure out what that means...

Notes from Tobias

shared storage is, usually, a pain. i guess your best bet would be nfs

usually, you export the storage via something, depending on the use-case

if it is 'multiple systems need access to the same files' you usually end up with NFS on linux/unix

for 'shared home directories for windows workstations' your controllers will likely export cifs (samba)

if you want VMs on there, you probably get a SAN (storage area network) with fibrechannel (basically: SCSI/SAS over fibre) or do the cheap nas-y version with iSCSI (scsi over ip)

that is, if you want to give block devices to individual 'things'

at the MPI we have a large NFS, for example, and for things that need a lot of fast I/O, we have boxes with local NVMe


Benchmarks

Not scientific, but just for the sake of jotting down how to measure...

Write speed

sudo dd if=/dev/zero of=/mnt/seedbank/p2p/dandi/test1.img bs=1G count=10 oflag=dsync

89.5MB/s

Read speed

time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=8k

164 MB/s

time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=256k

10.2 GB/s

time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=1M

12.9 GB/s