Notes of a Systems Administrator

Wednesday, 14 December 2016

Breaking a ZFS mirror - Solaris 11

So you've mirrored your system disk wrongly! This is how you would break it and fix the problem before re-mirroring:

root@solaris11server:~$ zpool status
  pool: rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
        pool will no longer be accessible on older software versions.
  scan: resilvered 108G in 20m20s with 0 errors on Fri Dec  2 10:16:44 2016

config:

        NAME                         STATE     READ WRITE CKSUM
        rpool                        ONLINE       0     0     0
          mirror-0                   ONLINE       0     0     0
            c0t5000CCA0166ED0ECd0s0  ONLINE       0     0     0
            c0t5000CCA0166F8C50d0    ONLINE       0     0     0

errors: No known data errors

Added mirror as disk not slice. Turned disk into an EFI/GPT tabled disk. Using format and selecting the disk gives this messsage:

selecting c0t5000CCA0166F8C50d0
[disk formatted]
/dev/dsk/c0t5000CCA0166F8C50d0s0 is part of active ZFS pool rpool. Please see zpool(1M).
Reading the primary EFI GPT label failed.  Using backup label.
Use the 'backup' command to restore the primary label.

Do we use the back command? Don't know. We must break the mirror first though - using zpool detach:

root@solaris11server:~# zpool detach rpool c0t5000CCA0166F8C50d0
root@solaris11server:~# zpool status
  pool: rpool
 state: ONLINE
status: The pool is formatted using an older on-disk format. The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
        pool will no longer be accessible on older software versions.
  scan: resilvered 108G in 20m20s with 0 errors on Fri Dec  2 10:16:44 2016

config:

        NAME                       STATE     READ WRITE CKSUM
        rpool                      ONLINE       0     0     0
          c0t5000CCA0166ED0ECd0s0  ONLINE       0     0     0

errors: No known data errors

Now look at the partition map on our good disk:

root@solaris11server:~# prtvtoc /dev/rdsk/c0t5000CCA0166ED0ECd0s0
* /dev/rdsk/c0t5000CCA0166ED0ECd0s0 (volume "solaris") partition map
*
* Dimensions:
*     512 bytes/sector
*     625 sectors/track
*      20 tracks/cylinder
*   12500 sectors/cylinder
*   46875 cylinders
*   46873 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*       First     Sector    Last
*       Sector     Count    Sector
*           0     12500     12499
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00      12500 585900000 585912499
       2      5    01          0 585912500 585912499

And compare it with our bad disk:

root@solaris11server:~# prtvtoc /dev/rdsk/c0t5000CCA0166F8C50d0s0
* /dev/rdsk/c0t5000CCA0166F8C50d0s0 partition map
*
* Dimensions:
*     512 bytes/sector
* 585937500 sectors
* 585937433 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*       First     Sector    Last
*       Sector     Count    Sector
*          34       222       255
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      4    00        256 585920827 585921082
       8     11    00  585921083     16384 585937466

So let us label our bad disk with the proper SMI label - not the EFI one:

root@solaris11server:~# format -e /dev/rdsk/c0t5000CCA0166F8C50d0
selecting /dev/rdsk/c0t5000CCA0166F8C50d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        inquiry    - show disk ID
        scsi       - independent SCSI mode selects
        cache      - enable, disable or query SCSI disk cache
        volname    - set 8-character volume name
        !     - execute , then return
        quit
format> label
[0] SMI Label
[1] EFI Label
Specify Label type[1]: 0
Auto configuration via format.dat[no]?
Auto configuration via generic SCSI-2[no]?
format> p


PARTITION MENU:
        0      - change `0' partition
        1      - change `1' partition
        2      - change `2' partition
        3      - change `3' partition
        4      - change `4' partition
        5      - change `5' partition
        6      - change `6' partition
        7      - change `7' partition
        select - select a predefined table
        modify - modify a predefined partition table
        name   - name the current table
        print  - display the current table
        label  - write partition map and label to the disk
        ! - execute , then return
        quit
partition> p
Current partition table (default):
Total disk cylinders available: 46873 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 -    20      128.17MB    (21/0/0)       262500
  1       swap    wu      21 -    41      128.17MB    (21/0/0)       262500
  2     backup    wu       0 - 46872      279.38GB    (46873/0/0) 585912500
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6        usr    wm      42 - 46872      279.13GB    (46831/0/0) 585387500
  7 unassigned    wm       0                0         (0/0/0)             0

partition> q

Now, let's copy the partition table of our good disk over to our bad one:

root@solaris11server:~# prtvtoc /dev/rdsk/c0t5000CCA0166ED0ECd0s0 | fmthard -s - /dev/rdsk/c0t5000CCA0166F8C50d0s2
fmthard:  New volume table of contents now in place.
root@solaris11server:~# prtvtoc /dev/rdsk/c0t5000CCA0166F8C50d0s0
* /dev/rdsk/c0t5000CCA0166F8C50d0s0 partition map
*
* Dimensions:
*     512 bytes/sector
*     625 sectors/track
*      20 tracks/cylinder
*   12500 sectors/cylinder
*   46875 cylinders
*   46873 accessible cylinders
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*       First     Sector    Last
*       Sector     Count    Sector
*           0     12500     12499
*
*                          First     Sector    Last
* Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      2    00      12500 585900000 585912499
       2      5    01          0 585912500 585912499

All is now good, so let's attach (mirror) our system disks:

root@solaris11server:~# zpool attach rpool c0t5000CCA0166ED0ECd0s0 c0t5000CCA0166F8C50d0s0
Make sure to wait until resilver is done before rebooting.
root@solaris11server:~# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function in a degraded state.
action: Wait for the resilver to complete.
        Run 'zpool status -v' to see device specific details.
  scan: resilver in progress since Wed Dec 14 10:13:36 2016
    1.52G scanned out of 108G at 58.9M/s, 30m52s to go
    1.52G resilvered, 1.41% done
config:

        NAME                         STATE     READ WRITE CKSUM
        rpool                        DEGRADED     0     0     0
          mirror-0                   DEGRADED     0     0     0
            c0t5000CCA0166ED0ECd0s0  ONLINE       0     0     0
            c0t5000CCA0166F8C50d0s0  DEGRADED     0     0     0  (resilvering)

errors: No known data errors

After the resilvering has finished, our system is mirrored correctly! :)

Friday, 4 November 2016

GitLab CE - repo web page not updating

So you've just created a GitLab repo and push the git working directory contents to it but the page does not change from the default instructions. To fix this log on to the gitlab server and perform the following command as root:

# gitlab-rake cache:clear

This is on the following GitLab CE:

GitLab 8.13.3

GitLab Shell 3.6.6

GitLab Workhorse 0.8.5

GitLab API v3

Git 2.7.4

Ruby 2.3.1p112

Rails 4.2.7.1

PostgreSQL 9.2.18

Tuesday, 1 November 2016

Dirty CoW kernel check - CentOS

So there's been a load of work due to the Dirty CoW vulnerability... you need to find out if there's been a kernel update so that you can reboot your CentOS P/VM... Here's a oneliner:

if [ "`rpm -q kernel --queryformat '%{installtime} %{version}-%{release}.%{arch}\n' | \
sort -n -k1 | tail -1 | cut -d ' ' -f 2`" = "`uname -r`" ]; \
then echo "You are running the latest kernel" && uname -r; \
else echo "There is a new kernel. You need a reboot" && echo "Current kernel: " && uname -r &&  \
echo "The latest kernel: " && rpm -q kernel --queryformat '%{version}-%{release}.%{arch}\n' | sort -n -k1 | tail -1;  fi

This is what it looks like on CentOS 6:

# if [ "`rpm -q kernel --queryformat '%{installtime} %{version}-%{release}.%{arch}\n' |sort -n -k1 | tail -1 | cut -d ' ' -f 2`" = "`uname -r`" ]; then echo "You are running the latest kernel" && uname -r; else echo "There is a new kernel. You need a reboot" && echo "Current kernel: " && uname -r && echo "The latest kernel: " && rpm -q kernel --queryformat '%{version}-%{release}.%{arch}\n' | sort -n -k1 | tail -1;  fi
There is a new kernel. You need a reboot
Current kernel:
2.6.32-642.4.2.el6.x86_64
The latest kernel:
2.6.32-642.6.2.el6.x86_64

The table below shows which kernel you should be running to fix the Dirty CoW vulnerability:

Distro	Kernel version
CentOS 5	2.6.32-642.3.1.el6.x86_64
CentOS 6	2.6.32-642.6.2.el6.x86_64
CentOS 7	3.10.0-327.36.3.el7.x86_64
Debian 7	3.2.82-1
Debian 8	3.16.36-1+deb8u2

Friday, 28 October 2016

Postfix MTA service not working! CentOS 6

I was having problems with keeping the Postfix MTA configured and running with Puppet. Each time Puppet ran it detected that it wasn't running and attempted to start it with no avail. The error when looking at the service was this:

# service postfix status
master dead but pid file exists

But removing the pid file didn't not help:

# locate postfix|grep pid
/var/spool/postfix/pid
/var/spool/postfix/pid/master.pid
[root@webtest ~]# rm /var/spool/postfix/pid/master.pid
rm: remove regular file `/var/spool/postfix/pid/master.pid'? y
[root@webtest ~]# service postfix status
master dead but subsys locked

So looking at the logs this was seen:

# tail  /var/log/maillog
Oct 30 19:44:06 webtest postfix/master[8005]: fatal: bind 127.0.0.1 port 25: Address already in use
Oct 30 20:09:49 webtest postfix/postfix-script[10053]: starting the Postfix mail system
Oct 30 20:09:49 webtest postfix/master[10054]: fatal: bind 127.0.0.1 port 25: Address already in use
Oct 30 20:10:04 webtest postfix/postfix-script[10602]: starting the Postfix mail system
Oct 30 20:10:04 webtest postfix/master[10603]: fatal: bind 127.0.0.1 port 25: Address already in use
Oct 30 20:10:53 webtest postfix/postfix-script[11037]: starting the Postfix mail system

The problem looks like another MTA was running hogging port 25. A quick ps for sendmail revealed nothing, but there's another agent that comes with CentOS 6:

[root@webtest ~]# ps -ef|grep send
root     12448  9780  0 20:16 pts/0    00:00:00 grep send
[root@webtest ~]# ps -ef|grep exim
root     12109  9780  0 20:22 pts/0    00:00:00 grep exim
exim     57456     1  0 Jul07 ?        00:00:00 /usr/sbin/exim -bd -q1h

[root@webtest ~]# service exim stop
Shutting down exim:                                        [  OK  ]
[root@webtest ~]# chkconfig exim off

Now a Puppet run should install and run Postfix without a problem:

# puppet agent -t
Notice: Local environment: 'production' doesn't match server specified node environment 'websites', switching agent to 'websites'.
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for website.domain.com
Info: Applying configuration version '1477858442'
Notice: /Stage[main]/postfixmta/Service[postfix]/ensure: ensure changed 'stopped' to 'running'
Info: /Stage[main]/postfixmta/Service[postfix]: Unscheduling refresh on Service[postfix]
Notice: Applied catalog in 1.67 seconds
# puppet agent -t
Notice: Local environment: 'production' doesn't match server specified node environment 'websites', switching agent to 'websites'.
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for website.domain.com
Info: Applying configuration version '1477858442'
Notice: Applied catalog in 1.46 seconds

Monday, 3 October 2016

Fix Analytics not displaying in OpsCenter for LDOM - Solaris 11

root@solaris-ldom:~# svcs scn-agent
STATE          STIME    FMRI
maintenance    Sep_23   svc:/application/management/common-agent-container-1:scn-agent
root@solaris-ldom:~# svcs -xv
svc:/application/management/common-agent-container-1:scn-agent (Cacao, a common Java container for JDMK/JMX based management solution)
State: maintenance since Fri Sep 23 22:36:14 2016
Reason: Restarting too quickly.
  See: http://support.oracle.com/msg/SMF-8000-L5
  See: man -M /usr/share/man -s 1M cacaoadm
  See: man -M /usr/share/man -s 5 cacao
  See: /var/svc/log/application-management-common-agent-container-1:scn-agent.log
Impact: This service is not running.
root@solaris-ldom:~# cat /var/svc/log/application-management-common-agent-container-1:scn-agent.log
[ Mar 24 09:57:57 Disabled. ]
[ Mar 24 09:57:57 Rereading configuration. ]
[ Mar 24 09:58:01 Enabled. ]

-cut-

[ Sep 23 22:36:12 Stopping because all processes in service exited. ]
[ Sep 23 22:36:13 Executing stop method ("/usr/lib/cacao/lib/tools/scripts/cacao_smf stop scn-agent"). ]
[ Sep 23 22:36:14 Method "stop" exited with status 0. ]
[ Sep 23 22:36:14 Restarting too quickly, changing state to maintenance. ]
root@solaris-ldom:~# svcadm disable svc:/application/management/common-agent-container-1:scn-agent
root@solaris-ldom:~# svcs scn-agent
STATE          STIME    FMRI
disabled       11:55:48 svc:/application/management/common-agent-container-1:scn-agent
root@solaris-ldom:~# svcs -xv
root@solaris-ldom:~# svcadm enable svc:/application/management/common-agent-container-1:scn-agent
root@solaris-ldom:~# svcs -xv
svc:/application/management/common-agent-container-1:scn-agent (Cacao, a common Java container for JDMK/JMX based management solution)
State: offline* transitioning to online since Mon Sep 26 11:56:09 2016
Reason: Start method is running.
  See: http://support.oracle.com/msg/SMF-8000-C4
  See: man -M /usr/share/man -s 1M cacaoadm
  See: man -M /usr/share/man -s 5 cacao
  See: /var/svc/log/application-management-common-agent-container-1:scn-agent.log
Impact: This service is not running.
root@solaris-ldom:~# tail /var/svc/log/application-management-common-agent-container-1:scn-agent.log
[ Sep 23 22:31:50 Executing start method ("/usr/lib/cacao/lib/tools/scripts/cacao_smf start scn-agent"). ]
[ Sep 23 22:33:13 Method "start" exited with status 0. ]
[ Sep 23 22:36:12 Stopping because all processes in service exited. ]
[ Sep 23 22:36:13 Executing stop method ("/usr/lib/cacao/lib/tools/scripts/cacao_smf stop scn-agent"). ]
[ Sep 23 22:36:14 Method "stop" exited with status 0. ]
[ Sep 23 22:36:14 Restarting too quickly, changing state to maintenance. ]
[ Sep 26 11:55:48 Leaving maintenance because disable requested. ]
[ Sep 26 11:55:48 Disabled. ]
[ Sep 26 11:56:09 Enabled. ]
[ Sep 26 11:56:09 Executing start method ("/usr/lib/cacao/lib/tools/scripts/cacao_smf start scn-agent"). ]
root@solaris-ldom:/var/adm# svcs scn-agent
STATE          STIME    FMRI
online         11:57:12 svc:/application/management/common-agent-container-1:scn-agent
root@solaris-ldom:/var/adm#

Monday, 5 September 2016

SELinux and sending mail via HTTPD - CentOS 7

Use this SELinux command to give the Apache process to use sendmail:

#sudo setsebool -p httpd_can_sendmail 1

Thursday, 1 September 2016

Monitor DNS lookup

# tcpdump -i eth0 port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
10:48:15.324300 IP client.63533 > dns-server.domain: 17904+ PTR? 1.0.0.127.in-addr.arpa. (40)
10:48:15.324848 IP dns-server.domain > client.63533: 17904* 1/1/2 PTR localhost. (121)
10:48:15.325137 IP client.50547 > dns-server.domain: 49520+ AAAA? localhost. (27)
10:48:15.325293 IP client.43181 > dns-server.domain: 25134+ PTR? xx.x.xxx.xxx.in-addr.arpa. (43)
10:48:15.325643 IP dns-server.domain > client.50547: 49520* 1/1/1 AAAA ::1 (85)
10:48:15.325903 IP dns-server.domain > client.43181: 25134* 1/3/6 PTR dns-server. (268)
10:48:19.565837 IP client.25663 > dns-server.domain: 43756+ AAAA? client. (42)
10:48:19.566389 IP dns-server.domain > client.25663: 43756 NXDomain* 0/1/0 (93)
10:48:19.566497 IP client.64053 > dns-server.domain: 55198+ AAAA? client. (40)
10:48:19.567026 IP dns-server.domain > client.64053: 55198* 0/1/0 (91)
10:48:19.567086 IP client.49399 > dns-server.domain: 1076+ AAAA? client. (37)
10:48:19.567600 IP dns-server.domain > client.49399: 1076 NXDomain* 0/1/0 (88)
10:48:19.567656 IP client.24922 > dns-server.domain: 50409+ AAAA? client. (26)
10:48:19.568080 IP dns-server.domain > client.24922: 50409 NXDomain 0/1/0 (101)
10:53:15.248429 IP client.17122 > dns-server.domain: 45962+ PTR? 1.0.0.127.in-addr.arpa. (40)
10:53:15.248968 IP dns-server.domain > client.17122: 45962* 1/1/2 PTR localhost. (121)
10:53:15.249332 IP client.17597 > dns-server.domain: 59594+ AAAA? localhost. (27)
10:53:15.249819 IP dns-server.domain > client.17597: 59594* 1/1/1 AAAA ::1 (85)
10:53:19.590980 IP client.52610 > dns-server.domain: 39707+ AAAA? client. (42)
10:53:19.591525 IP dns-server.domain > client.52610: 39707 NXDomain* 0/1/0 (93)
10:53:19.591683 IP client.32529 > dns-server.domain: 23733+ AAAA? client. (40)
10:53:19.592180 IP dns-server.domain > client.32529: 23733* 0/1/0 (91)