Seedbank

    From Aharoni Lab Wiki

    Up to: Servers, Earthseed

    Or, more specifically, the Storinator Q30 enhanced.

    Also known as

    seedbank

    and boy is she a mysterious one

    Related Projects

    Hardware

    • 30 16.38TB hard drives (1 used for OS, so 29)

    Web Administration (IPMI)

    There are two subsystems on the NAS, the IPMI system and the main operating system. The IPMI system can be used to configure the system before an OS is present and manage other administration tasks.

    • Log into the web console through its IP (currently 192.168.1.28), but check the DHCP server
    • The default creds are
      • Username: ADMIN
      • Password: (on side of server)

    Install Debian

    See: https://knowledgebase.45drives.com/kb/kb450289-ubuntu-20-04-redundant-os-installation/

    • Get yourself a copy of debian
    • Open the IPMI control panel (see above)
    • Launch a virtual console either with the HTML5 or Java plugin
      • To use java, you'll need openjdk, and since apparently this kind of java file has been deprecated you'll also need openwebstart
      • Then open the launch.jnlp file with openwebstart (not sure how to do via CLI, right click and "open with...")
      • It seems like HTML5 can do everything the java version does without needing all that java shit, so might as well use that?
    • The power of the "server" is different than the IPMI subsystem, so you might need to turn on the server on the Remote Control -> Power Control menu
    • Wait this thing comes with Ubuntu installed... nvm for now

    Config

    See the ansible configuration for the seedbank host.

    Security

    • Users
      • Root password changed
      • User password changed
      • Made user jonny that is in sudoers
    • SSH
      • Root access disabled
      • Password access disabled
    • Firewall
      • Disable all incoming connections, except LAN to port 22.

    ZFS

    There is so much to know about ZFS, lots of docs at bottom of page. it is late and i am just trying to get down the minimal amount needed to recreate what we gone and done.

    The ansible config is as follows:

    zfs_create_pools: true
    zfs_create_volumes: true
    zfs_enable_iscsi: false # handle iscsi separately
    zfs_iscsitarget_iqn: iqn.2023-10.com.aharoni-lab.seed
    zfs_iscsitarget_target_portals:
      - "192.168.1.0/24"
    zfs_install_update: false
    
    zfs_pools:
      - name: P2P
        action: create
        compression: lz4
        type: raidz2
        state: present
        devices:
          - 'scsi-35000c500db4dca84'
          - 'scsi-35000c500db76a540'
          - 'scsi-35000c500db76bea3'
          - 'scsi-35000c500db779f0a'
          - 'scsi-35000c500db77ecb8'
          - 'scsi-35000c500db786ba1'
      - name: P2P
        action: add
        compression: lz4
        type: raidz2
        state: present
        devices:
          - 'scsi-35000c500db79489a'
          - 'scsi-35000c500db7a4a5e'
          - 'scsi-35000c500db7aa844'
          - 'scsi-35000c500db7b3ffd'
          - 'scsi-35000c500db7cb67e'
          - 'scsi-35000c500db7d3881'
      - name: P2P
        action: add
        compression: lz4
        type: raidz2
        state: present
        devices:
          - 'scsi-35000c500db84c49c'
          - 'scsi-35000c500db85463a'
          - 'scsi-35000c500db8584e5'
          - 'scsi-35000c500db867df0'
          - 'scsi-35000c500db86b99d'
          - 'scsi-35000c500db86cd45'
      - name: P2P
        action: add
        compression: lz4
        type: raidz2
        state: present
        devices:
          - 'scsi-35000c500db8ccc4e'
          - 'scsi-35000c500db915eb7'
          - 'scsi-35000c500dba3809b'
          - 'scsi-35000c500dba3cce2'
          - 'scsi-35000c500dba3ea10'
          - 'scsi-35000c500dba55931'
    
    zfs_volumes:
      - name: DANDI
        pool: P2P
        shareiscsi: on
        iscsi_name: "{{ zfs_iscsitarget_iqn }}:P2P.DANDI"
        volsize: 250T
        lun: 1
        state: present
        allow:
          - "192.168.1.0/24"
    

    Which makes a

    • ZFS VOLUME named DANDI which is within a...
    • ZFS POOL named P2P, which is composed of...
    • 4x VDRIVES, each of which has...
    • 6x PHYSICAL DRIVES configured in a raidz2 configuration.

    Useful commands for zfs include...

    • zpool status - show what pools exist!!!!
    • zfs get all - show all the configuration!!!!

    iSCSI

    Why is this so HARD for the LOVE OF GOD

    • Open port 3260
    • MAKE the damn LOGICAL UNIT don't ASK ME ABOUT THSI COMMAND:
    sudo tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 --backing-store /dev/P2P/DANDI
    

    Which should look like THIS

    >>> sudo tgtadm --lld iscsi --op show --mode target
    Target 1: iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI
        System information:
            Driver: iscsi
            State: ready
        I_T nexus information:
        LUN information:
            LUN: 0
                Type: controller
                SCSI ID: IET     00010000
                SCSI SN: beaf10
                Size: 0 MB, Block size: 1
                Online: Yes
                Removable media: No
                Prevent removal: No
                Readonly: No
                SWP: No
                Thin-provisioning: No
                Backing store type: null
                Backing store path: None
                Backing store flags:
            LUN: 1
                Type: disk
                SCSI ID: IET     00010001
                SCSI SN: beaf11
                Size: 274877907 MB, Block size: 512
                Online: Yes
                Removable media: No
                Prevent removal: No
                Readonly: No
                SWP: No
                Thin-provisioning: No
                Backing store type: rdwr
                Backing store path: /dev/P2P/DANDI
                Backing store flags:
        Account information:
        ACL information:
            192.168.1.0/24
    
    • That actually should correspond to a config file in /etc/tgt/targets.conf
    default-driver iscsi
    
    <target iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI>
            backing-store /dev/P2P/DANDI
            initiator-name 192.168.1.0/24
            incominguser earthseed SOME_PASSWORD
    </target>
    

    Then from the CLIENT machine, using open-iscsi, discover the shared iscsi volumes on the SERVER:

    >>> sudo iscsiadm --mode discovery --portal 192.168.1.30 --type sendtargets
    192.168.1.30:3260,1 iqn.2023-10.com.aharoni-lab.seed:P2P.DANDI
    

    Then configure your CREDS in the /etc/iscsi/iscsid.conf file like

    node.session.auth.authmethod = CHAP
    node.session.auth.username = earthseed
    node.session.auth.password = SOME_PASSWORD
    discovery.sendtargets.auth.authmethod = CHAP
    discovery.sendtargets.auth.username = earthseed
    discovery.sendtargets.auth.password = SOME_PASSWORD
    


    And that should leave you with some UNLABELED DRIVE!!!!

    jonny@earthseed:~$ sudo lsblk -e7  -d
    NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
    sda       8:0    0   9.1T  0 disk /mnt/dataata
    sdb       8:16   0   9.1T  0 disk /mnt/databta
    sdc       8:32   0   9.1T  0 disk /mnt/datacta
    sdd       8:48   0   250T  0 disk
    nvme0n1 259:0    0 931.5G  0 disk
    


    Formatting the Partition

    THEN once you have gone through ALL THAT SHIT you need to format the thing you just made

    First make a partition table with parted

    >>> sudo parted /dev/sdd
    
    GNU Parted 3.5
    Using /dev/sdd
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    
    >>> (parted) mklabel gpt
    >>> (parted) print
    Model: IET VIRTUAL-DISK (scsi)
    Disk /dev/sdd: 275TB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    Disk Flags:
    
    Number  Start  End  Size  File system  Name  Flags
    
    >>> (parted) mkpart primary ext4 0% 100%
    >>> (parted) print
    Model: IET VIRTUAL-DISK (scsi)
    Disk /dev/sdd: 275TB
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    Disk Flags:
    
    Number  Start   End    Size   File system  Name     Flags
     1      1049kB  275TB  275TB  ext4         primary
    
    (parted) quit
    

    and then FORMAT THAT BAD BOY

    sudo mkfs.ext4 /dev/sdd1
    

    Get the ID by doing ls -l /dev/disk/by-id and find the one POINTING TO YOUR PARTITION

    >>> ls -l /dev/disk/by-id
    lrwxrwxrwx 1 root root  9 Oct 23 22:06 scsi-360000000000000000e00000000010001 -> ../../sdd
    lrwxrwxrwx 1 root root 10 Oct 23 22:06 scsi-360000000000000000e00000000010001-part1 -> ../../sdd1
    

    and ADD IT TO /etc/fstab.

    ID=scsi-360000000000000000e00000000010001-part1 /mnt/seedbank/p2p/dandi ext4 _netdev 0 0
    

    Maintenance

    Monitoring

    Interactively

    Take 10 second samples of I/O to and from individual devices:

    zpool iostat -v 10

    Busted Drives

    Check the status of your ZFS pool with zpool status, which might show you something like this:

    jonny@seedbank:~$ zpool status
      pool: P2P
     state: DEGRADED
    status: One or more devices are faulted in response to persistent errors.
    	Sufficient replicas exist for the pool to continue functioning in a
    	degraded state.
    action: Replace the faulted device, or use 'zpool clear' to mark the device
    	repaired.
    config:
    
    	NAME                        STATE     READ WRITE CKSUM
    	P2P                         DEGRADED     0     0     0
    	  raidz2-0                  DEGRADED     0     0     0
    	    scsi-35000c500db4dca84  FAULTED      0    22     0  too many errors
    	    scsi-35000c500db76a540  ONLINE       0     0     0
    	    scsi-35000c500db76bea3  ONLINE       0     0     0
    	    scsi-35000c500db779f0a  ONLINE       0     0     0
    	    scsi-35000c500db77ecb8  ONLINE       0     2     0
    	    scsi-35000c500db786ba1  ONLINE       0     0     0
    	  raidz2-1                  DEGRADED     0     0     0
    	    scsi-35000c500db79489a  ONLINE       0     0     0
    	    scsi-35000c500db7a4a5e  FAULTED      0    99     0  too many errors
    	    scsi-35000c500db7aa844  ONLINE       0     0     0
    	    scsi-35000c500db7b3ffd  ONLINE       0     0     0
    	    scsi-35000c500db7cb67e  ONLINE       0     0     0
    	    scsi-35000c500db7d3881  ONLINE       0     0     0
    	  raidz2-2                  DEGRADED     0     0     0
    	    scsi-35000c500db84c49c  ONLINE       0     0     0
    	    scsi-35000c500db85463a  FAULTED      0    36     0  too many errors
    	    scsi-35000c500db8584e5  ONLINE       0     0     0
    	    scsi-35000c500db867df0  ONLINE       0     0     0
    	    scsi-35000c500db86b99d  FAULTED      0    82     0  too many errors
    	    scsi-35000c500db86cd45  ONLINE       0     0     0
    	  raidz2-3                  ONLINE       0     0     0
    	    scsi-35000c500db8ccc4e  ONLINE       0     0     0
    	    scsi-35000c500db915eb7  ONLINE       0     0     0
    	    scsi-35000c500dba3809b  ONLINE       0     2     0
    	    scsi-35000c500dba3cce2  ONLINE       0     0     0
    	    scsi-35000c500dba3ea10  ONLINE       0     0     0
    	    scsi-35000c500dba55931  ONLINE       0     1     1  (repairing)
    

    These can be false positives, so you want to run SMART tests to confirm that there actually are errors.

    • Remind yourself how the ids map to device names: ls -la /dev/disk/by-id/
    • See SMART status: sudo smartctl /dev/sda -a
    • Run SMART test: sudo smartctl /dev/sda -t long

    That'll run in the background and take >24h to complete. Come back and check the results with smartctl later.

    Option 1: Not actually a problem

    If there are no problems, it'll look like this:

    SMART Self-test log structure revision number 1
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed without error       00%      1062
    

    If so, you can re-enable the drive, and ZFS will resilver it if needed.

    sudo zpool clear <POOL_NAME> <DEVICE_ID>

    eg.

    sudo zpool clear P2P scsi-35000c500db7a4a5e

    which should look like this:

    >>> zpool status
      pool: P2P
     state: ONLINE
    status: One or more devices is currently being resilvered.  The pool will
    	continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
      scan: resilver in progress since Wed Dec  6 15:54:21 2023
    	2.40T scanned at 6.39G/s, 1.30T issued at 3.46G/s, 305T total
    	73.4G resilvered, 0.43% done, 1 days 00:57:12 to go
    config:
    
    	NAME                        STATE     READ WRITE CKSUM
    	P2P                         ONLINE       0     0     0
    	  raidz2-0                  ONLINE       0     0     0
    	    scsi-35000c500db4dca84  ONLINE       0     0     0  (resilvering)
    	    scsi-35000c500db76a540  ONLINE       0     0     0
    	    scsi-35000c500db76bea3  ONLINE       0     0     0
    	    scsi-35000c500db779f0a  ONLINE       0     0     0
    	    scsi-35000c500db77ecb8  ONLINE       0     2     0
    	    scsi-35000c500db786ba1  ONLINE       0     0     0
    	  raidz2-1                  ONLINE       0     0     0
    	    scsi-35000c500db79489a  ONLINE       0     0     0
    	    scsi-35000c500db7a4a5e  ONLINE       0     0     0  (resilvering)
    	    scsi-35000c500db7aa844  ONLINE       0     0     0
    	    scsi-35000c500db7b3ffd  ONLINE       0     0     0
    	    scsi-35000c500db7cb67e  ONLINE       0     0     0
    	    scsi-35000c500db7d3881  ONLINE       0     0     0
    	  raidz2-2                  ONLINE       0     0     0
    	    scsi-35000c500db84c49c  ONLINE       0     0     0
    	    scsi-35000c500db85463a  ONLINE       0     0     0  (resilvering)
    	    scsi-35000c500db8584e5  ONLINE       0     0     0
    	    scsi-35000c500db867df0  ONLINE       0     0     0
    	    scsi-35000c500db86b99d  ONLINE       0     0     0  (resilvering)
    	    scsi-35000c500db86cd45  ONLINE       0     0     0
    	  raidz2-3                  ONLINE       0     0     0
    	    scsi-35000c500db8ccc4e  ONLINE       0     0     0
    	    scsi-35000c500db915eb7  ONLINE       0     0     0
    	    scsi-35000c500dba3809b  ONLINE       0     2     0
    	    scsi-35000c500dba3cce2  ONLINE       0     0     0
    	    scsi-35000c500dba3ea10  ONLINE       0     0     0
    	    scsi-35000c500dba55931  ONLINE       0     1     1
    

    Option 2: Replace the drive

    Modifying Pools

    Documentation

    rly need to split this page into ZFS and iSCSI

    Storinator

    ZFS

    Pool Faults

    Performance

    iSCSI

    Reference

    Quotations

    Daniel says that the 45drives ppl said this when ordering:

    I spoke with our Architect, and the Storinator Q30 configured with 2 vdevs of 15 HDDs in RAIDZ2 does have the capability to saturate a 10Gb network. I would recommend adding more resiliency by going with 3 vdevs of 10 HDDs in RAIDZ2. It will still be able to saturate a 10Gb network but will add more fault tolerance and faster resilvering times.

    we shall figure out what that means...

    Notes from Tobias

    shared storage is, usually, a pain. i guess your best bet would be nfs

    usually, you export the storage via something, depending on the use-case

    if it is 'multiple systems need access to the same files' you usually end up with NFS on linux/unix

    for 'shared home directories for windows workstations' your controllers will likely export cifs (samba)

    if you want VMs on there, you probably get a SAN (storage area network) with fibrechannel (basically: SCSI/SAS over fibre) or do the cheap nas-y version with iSCSI (scsi over ip)

    that is, if you want to give block devices to individual 'things'

    at the MPI we have a large NFS, for example, and for things that need a lot of fast I/O, we have boxes with local NVMe


    Benchmarks

    Not scientific, but just for the sake of jotting down how to measure...

    Write speed

    sudo dd if=/dev/zero of=/mnt/seedbank/p2p/dandi/test1.img bs=1G count=10 oflag=dsync
    

    89.5MB/s

    Read speed

    time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=8k
    

    164 MB/s

    time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=256k
    

    10.2 GB/s

    time dd if=/mnt/seedbank/p2p/dandi/test1.img of=/dev/null bs=1M
    

    12.9 GB/s