How to Disable Keyring Prompts of GNOME3 in Linux VDA Sessions

Here are some Keyring prompts you might encounter in your RHEL 7.2 VDA or Ubuntu 16.04 VDA session:

User-added image
User-added image
User-added image

You might disable Keyring prompts based on user identity or globally:

  • An example of disabling Keyring prompts based on user identity:

$ cat ~/.config/goa-1.0/accounts.conf

[Account account_1453086621]

Provider=kerberos

Identity=user1@XD.LOCAL

PresentationIdentity=XD.LOCAL

Realm=XD.LOCAL

SessionId=6be198ae5c5f0d47b85a0b73569c579d

IsTemporary=false

TicketingEnabled=true

You might see a lot of accounts in the above .conf file. That’s the result of a Keyring bug. For more information, see https://access.redhat.com/solutions/1609773.

In this case, set the last account’s IsTemporary to false.

  • An example of disabling Keyring prompts globally:

According to the official documentation of GNOME, to disable Keyring prompts globally, you need tostop the GNOME Keyring daemon by setting the autostart desktop files and D-Bus services files:

# grep -r gnome-keyring-daemon /etc/xdg/autostart

/etc/xdg/autostart/gnome-keyring-ssh.desktop:Exec=/usr/bin/gnome-keyring-daemon –start –components=ssh

/etc/xdg/autostart/gnome-keyring-gpg.desktop:#Exec=/usr/bin/gnome-keyring-daemon –start –components=gpg

/etc/xdg/autostart/gnome-keyring-pkcs11.desktop:#Exec=/usr/bin/gnome-keyring-daemon –start –components=pkcs11

/etc/xdg/autostart/gnome-keyring-secrets.desktop:#Exec=/usr/bin/gnome-keyring-daemon –start –components=secrets

# grep -r gnome-keyring-daemon /usr/share/dbus-1/services/

/usr/share/dbus-1/services/org.freedesktop.secrets.service:#Exec=/usr/bin/gnome-keyring-daemon –start –foreground –components=secrets

/usr/share/dbus-1/services/org.gnome.keyring.service:#Exec=/usr/bin/gnome-keyring-daemon –start –foreground –components=secrets

# grep -r gnome-software /etc/xdg/autostart/

/etc/xdg/autostart/gnome-software-service.desktop:#Exec=/usr/bin/gnome-software –gapplication-service

# vi /etc/polkit-1/rules.d/02-allow-colord.rules

polkit.addRule(function(action, subject) {

if ((action.id == “org.freedesktop.color-manager.create-device” ||

action.id == “org.freedesktop.color-manager.create-profile” ||

action.id == “org.freedesktop.color-manager.delete-device” ||

action.id == “org.freedesktop.color-manager.delete-profile” ||

action.id == “org.freedesktop.color-manager.modify-device” ||

action.id == “org.freedesktop.color-manager.modify-profile”) &&

subject.isInGroup(“domain users”)) {

return polkit.Result.YES;

}

});

Set your files on the RHEL 7.2 VDA or Ubuntu 16.04 VDA according to the example above. Next time you log in to a Linux VDA session, there will be no GNOME Keyring prompt.

Related:

7022520: Troubleshooting boot issues (multipath with lvm).

1 – Insert a SLES media which corresponds to the current running version on the system or the version . Then, set bios to boot primarily from the CD/iso inserted. On the menu, select “Rescue system” or “more” and “Rescue system”.

2 – By default, the rescue system will activate the LVM volume group right from the boot, which is not optimal when rescuing a system with multipath configured.

In order to deactivate the volume group, use the following command :

# vgchange -an

Now, start the multipath service with :

# modprobe dm_multipath

For SLES 12 :

# systemctl restart multipathd

For SLES 11 :

# service multipathd restart

Confirm that the paths are visible with the command :

# multipath -ll

3 – Make sure that the filter on /etc/lvm/lvm.conf is set to reach the devices over multipath by changing the “filter =” line as described below :

filter = [ “a|/dev/disk/by-id/dm-name-.*|”, “r/.*/” ]

After that, restart lvmetad with command:

# systemctl restart lvm2-lvmetad (SLES 12 only)

# pvscan ; vgscan

If set correctly, the “pvscan” command shouldn’t output “Found duplicate PV” messages.

4 – Mount the root LV using the /dev/mapper and bind the /proc, /dev, /sys and /run to it:

# mount /dev/mapper/<rootvg>-<rootlv> /mnt

For SLES 12 :

# for i in dev proc sys run ; do mount -o bind /$i /mnt/$i ; done

For SLES 11 :

# for i in dev proc sys ; do mount -o bind /$i /mnt/$i ; done

after, change the root to /mnt :

# chroot /mnt

5 – Once in the chroot environment, make sure that the multipath is enabled again and that the paths are visible too :

For SLES12 :

# systemctl enable multipathd

# systemctl start multipathd

For SLES 11 :

# chkmod multipath on

# service multipathd start ;

# multipath -ll

6 – Make sure that local and multipathed devices are correctly listed on /etc/fstab.

For multipath devices, use “UUID=” or “/dev/disk/by-id/dm-name-*” instead of “/dev/sd*” or “/dev/disk/by-*”, which should be used only for local disks.

7 – As we are still on the system activated by the chroot, change the /etc/lvm/lvm.conf and reload the lvmetad if necessary as described on step 3. Also make sure that there are “Found duplicate PV” entries after running the “pvscan” command.

8 – Mount all volumes with “mount -a” command.
NOTE: In case /usr is a separate filesystem, exit chroot here and mount it manually to /mnt/usr
# mount /dev/mapper/<rootvg>-<usrlv> /mnt/usr ; chroot /mnt ; mount -a

Make sure that all volumes were correct mounted and listed on “mount” or “findmnt” command.

It is also a good idea to verify if the output of “cat /proc/mounts” match with the “cat /etc/mtab”. If there are any differences between the two, please do as below :

# cat /proc/mounts > /etc/mtab

Note: this is not necessary on SLE12 !
9 – Make sure that the file /etc/dracut.conf.d/10-mp.conf has the line “force_drivers+=”dm_multipath dm_service_time”” in it.

This can be added using the following command :

# echo ‘force_drivers+=”dm_multipath dm_service_time”‘ >> /etc/dracut.conf.d/10-mp.conf

10 – Next, make a backup of the old initrd and create a new one using the following commands :

# cd /boot

# mkdir brokeninitrd

# cp initrd-<version> brokeninitrd

For SLES 12 :

# dracut –kver <kernelnumber>-default -f -a multipath

For SLES 11 :

# mkinitrd -f multipath -k vmlinuz-<kernelnumber>-default -i initrd-<kernelnumber>-default

And also a new grub config :

# yast bootloader

Once on the YaST interface, under “Bootloader Options” tab, change the value (by increasing or decreasing) on “Timeout in Seconds”.

This action will force the system to see the changes made and re-create the grub configuration. It is also a good idea to mark the “Boot from Master Boot Record” on “Boot Code Options” tab.

After the changes, leave the YaST interface by selecting “ok”.

Alternatively, it is possible to manually make the changes as below :

# grub2-mkconfig -o /boot/grub2/grub.cfg

# grub2-install /dev/mapper/dm-name-… # choose the multipath device here!

11 – Ensure that the dm-multipath and lvm2 modules are included on the initrd image created previously with :

# lsinitrd /boot/init<version> |less

12 – Verify that the local devices are blacklisted in /etc/multipath.conf file. If this file doesn’t exist, create one and insert a “blacklist” session.

Example below :

blacklist {

wwid “3600508b1001030343841423043300400”

}

13 – Reboot the system.

Related:

  • No Related Posts

Install VAAI 1.2.3 on ESX 6.0 could cause ESX server to become unresponsive

Article Number: 504703 Article Version: 4 Article Type: Break Fix



Isilon VAAI,Isilon OneFS,VMware ESX Server

After installing VAAI 1.2.3 plugin on ESX 6.0 server, log file /var/log/isi-nas-vib.log can grow unbounded, eventually filling up /var partition and can cause ESX server to become unresponsive. The log file will continue to be recreated after it is removed.

VAAI 1.2.3 plugin was built with debug enabled by default

WORKAROUND

Uninstall VAAI 1.2.3 plugin until a fix is provided.

OR

Setup a cron job on ESX 6.0 server to periodically remove the log file:

1. ssh into ESX server and login as root user

2. edit /etc/rc.local.d/local.sh, and add the following lines toward the end, before “exit 0”:

/bin/echo ‘*/15 * * * * /bin/rm /var/log/isi-nas-vib.log’ >>/var/spool/cron/crontabs/root

/bin/kill -HUP $(cat /var/run/crond.pid)

/usr/lib/vmware/busybox/bin/busybox crond

3. The reason for the above step is so if ESX server reboots, the workaround will persist after reboot. But at this point workaround has not been set on the ESX server yet.

4. Manually run the above 3 yellow-highlighted commands to implement the workaround on the current ESX server session.

5. Monitor /var/log/syslog.log and make sure you see every 15min an entry such as:

2017-10-03T15:15:01Z crond[35429]: crond: USER root pid 38236 cmd /bin/rm /var/log/isi-nas-vib.log

Related:

7023524: apache2 does not start after in place upgrade from SLES 11 SP4 to SLES 12 SP3

First issue:

webserver:/etc/apache2 # systemctl statusapache2.service

? apache2.service – The Apache Webserver

Loaded: loaded(/usr/lib/systemd/system/apache2.service; disabled; vendor preset:disabled)

Active: failed (Result: exit-code) since Thu2018-11-15 13:02:16 CET; 22s ago

Process: 3990 ExecStop=/usr/sbin/start_apache2 -DSYSTEMD-DFOREGROUND -k graceful-stop (code=exited, status=1/FAILURE)

Process: 3981 ExecStart=/usr/sbin/start_apache2 -DSYSTEMD-DFOREGROUND -k start (code=exited, status=1/FAILURE)

Main PID: 3981 (code=exited, status=1/FAILURE)

Nov 15 13:02:16 webserver systemd[1]: Starting The ApacheWebserver…

Nov 15 13:02:16 webserver start_apache2[3981]: httpd-prefork:Syntax error on line 204 of /etc/apache2/httpd.conf: Syntax erroron line 106 of /etc/apache2/default-server.c…f required)

Nov 15 13:02:16 webserver systemd[1]: apache2.service: Main processexited, code=exited, status=1/FAILURE

Nov 15 13:02:16 webserver start_apache2[3990]: httpd-prefork:Syntax error on line 204 of /etc/apache2/httpd.conf: Syntax erroron line 106 of /etc/apache2/default-server.c…f required)

Nov 15 13:02:16 webserver systemd[1]: apache2.service: Controlprocess exited, code=exited status=1

Nov 15 13:02:16 webserver systemd[1]: Failed to start The ApacheWebserver.

Nov 15 13:02:16 webserver systemd[1]: apache2.service: Unit enteredfailed state.

Nov 15 13:02:16 webserver systemd[1]: apache2.service: Failed withresult ‘exit-code’.

Hint: Some lines were ellipsized, use -l to show in full.

Which can be solved by editing /etc/apache2/default-server.conf andchange the line saying:

Include/etc/apache2/conf.d/apache2-manual?conf to IncludeOptional/etc/apache2/conf.d/apache2-manual?conf

See /etc/apache2/default-server.conf.rpmnewto compare with.


Second Issue:

When starting apache2 after the above change, the following issue maybe seen:

webserver:/etc/apache2 # systemctl statusapache2.service

? apache2.service – The Apache Webserver

Loaded: loaded(/usr/lib/systemd/system/apache2.service; disabled; vendor preset:disabled)

Active: failed (Result: exit-code) since Thu2018-11-15 13:03:50 CET; 2s ago

Process: 4017 ExecStop=/usr/sbin/start_apache2 -DSYSTEMD-DFOREGROUND -k graceful-stop (code=exited, status=1/FAILURE)

Process: 4007 ExecStart=/usr/sbin/start_apache2 -DSYSTEMD-DFOREGROUND -k start (code=exited, status=1/FAILURE)

Main PID: 4007 (code=exited, status=1/FAILURE)

Nov 15 13:03:50 webserver systemd[1]: Starting The ApacheWebserver…

Nov 15 13:03:50 webserver start_apache2[4007]: AH00526: Syntaxerror on line 28 of /etc/apache2/default-server.conf:

Nov 15 13:03:50 webserver start_apache2[4007]: Invalid command’Order’, perhaps misspelled or defined by a module not included inthe server configuration

Nov 15 13:03:50 webserver systemd[1]: apache2.service: Main processexited, code=exited, status=1/FAILURE

Nov 15 13:03:50 webserver start_apache2[4017]: AH00526: Syntaxerror on line 28 of /etc/apache2/default-server.conf:

Nov 15 13:03:50 webserver start_apache2[4017]: Invalid command’Order’, perhaps misspelled or defined by a module not included inthe server configuration

Nov 15 13:03:50 webserver systemd[1]: apache2.service: Controlprocess exited, code=exited status=1

Nov 15 13:03:50 webserver systemd[1]: Failed to start The ApacheWebserver.

Nov 15 13:03:50 webserver systemd[1]: apache2.service: Unit enteredfailed state.

Nov 15 13:03:50 webserver systemd[1]: apache2.service: Failed withresult ‘exit-code’.


The Second issue is described in https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12/saying:

8.3.4 Apache 2.4

With Apache 2.4, some changes have been introduced that affectApache’s access control scheme. Previously, the directives “Allow”,”Deny”, and “Order” have determined if access to a resource hasbeen granted with Apache 2.2.
With 2.4, these directives have been replaced by the “Require”directive.
For backwards compatibility of 2.2 configurations, the SUSELinux Enterprise Server 12 apache2 package understands bothschemes, Deny/Allow (apache 2.2) and Require (apache 2.4).

For more information on how to easily switch between the twoschemes, see the file /usr/share/doc/packages/apache2/README-access_compat.txt.

From /usr/share/doc/packages/apache2/README-access_compat.txtit says that running “a2enmodaccess_compat” is the solution.

After running a2enmod access_compatapache does start.

webserver:/etc/apache2 # systemctl statusapache2.service

? apache2.service – The Apache Webserver

Loaded: loaded(/usr/lib/systemd/system/apache2.service; disabled; vendor preset:disabled)

Active: active (running) since Thu 2018-11-15 13:04:21CET; 1s ago

Process: 4017 ExecStop=/usr/sbin/start_apache2 -DSYSTEMD-DFOREGROUND -k graceful-stop (code=exited, status=1/FAILURE)

Main PID: 4041 (httpd-prefork)

Status: “Processing requests…”

Tasks: 6

CGroup: /system.slice/apache2.service

+-4041/usr/sbin/httpd-prefork -DSYSCONFIG -C PidFile /var/run/httpd.pid-C Include /etc/apache2/sysconfig.d//loadmodule.conf -C Include/etc/apache2/sysconfig.d//global.conf -…

+-4050/usr/sbin/httpd-prefork -DSYSCONFIG -C PidFile /var/run/httpd.pid-C Include /etc/apache2/sysconfig.d//loadmodule.conf -C Include/etc/apache2/sysconfig.d//global.conf -…

+-4051/usr/sbin/httpd-prefork -DSYSCONFIG -C PidFile /var/run/httpd.pid-C Include /etc/apache2/sysconfig.d//loadmodule.conf -C Include/etc/apache2/sysconfig.d//global.conf -…

+-4052/usr/sbin/httpd-prefork -DSYSCONFIG -C PidFile /var/run/httpd.pid-C Include /etc/apache2/sysconfig.d//loadmodule.conf -C Include/etc/apache2/sysconfig.d//global.conf -…

+-4053/usr/sbin/httpd-prefork -DSYSCONFIG -C PidFile /var/run/httpd.pid-C Include /etc/apache2/sysconfig.d//loadmodule.conf -C Include/etc/apache2/sysconfig.d//global.conf -…

+-4054/usr/sbin/httpd-prefork -DSYSCONFIG -C PidFile /var/run/httpd.pid-C Include /etc/apache2/sysconfig.d//loadmodule.conf -C Include/etc/apache2/sysconfig.d//global.conf -…

Nov 15 13:04:21 webserver systemd[1]: Starting The ApacheWebserver…

Nov 15 13:04:21 webserver systemd[1]: Started The ApacheWebserver.

Related:

  • No Related Posts

Isilon OneFS 8.0.0.2: Intermittent node unresponsive or node split from the Infiniband Network due to isi_phone_home

Article Number: 494612 Article Version: 6 Article Type: Break Fix



Isilon,Isilon OneFS,Isilon OneFS 7.2.1.0,Isilon OneFS 7.2.1.1,Isilon OneFS 7.2.1.2,Isilon OneFS 7.2.1.3,Isilon OneFS 7.2.1.4,Isilon OneFS 7.2.1.5,Isilon OneFS 8.0.0.0,Isilon OneFS 8.0.0.1,Isilon OneFS 8.0.0.2,Isilon OneFS 8.0.0.3

Node offline events are generated by the Cluster and after checking the uptime of the nodes in the cluster it is found that the nodes were split from the group.

You may notice lot of intermittent group changes.

For example:

2017-01-05T01:36:37-06:00 <0.5> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_rtxn.c:1787](pid 72308=”kt: gmp-merge”)(tid=101118) gmp merge took 33s

2017-01-07T01:55:11-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1734](pid 56721=”kt: rtxn_split”)(tid=100805) group change: <3,23351>

[up: 19 nodes, down: 1 node] (node 13 changed to down)

2017-01-07T01:55:11-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1735](pid 56721=”kt: rtxn_split”)(tid=100805) new group: <3,23351>:

{ 1:0-14,18-32, 2:0-14,18-31,36, 3-4:0-14,18-32, 5:0-7,9-10,12-14,18-32,38-39, 6-7:0-14,18-32, 8:0-13,17-31,35, 9:0-10,12-14,18-32,36, 10-12,14-18:0-14,18-32, 19:0-3,5-14,18-32,36, 20:0-14,18-32, down: 13 }

2017-01-07T01:55:11-06:00 <0.5> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_rtxn.c:2020](pid 56721=”kt: rtxn_split”)(tid=100805) gmp split took 20s

2017-01-07T01:56:20-06:00 <0.5> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_rtxn.c:1598](pid 56826=”kt: gmp-merge”)(tid=100578) gmp txn <1,1301> error 85 from rtxn_exclusive_merge_lock_get after 44001ms

2017-01-07T01:56:48-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1734](pid 56980=”kt: rtxn_split”)(tid=103079) group change: <9,23352>

[up: 18 nodes, down: 2 nodes] (node 20 changed to down)

2017-01-07T01:56:48-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1735](pid 56980=”kt: rtxn_split”)(tid=103079) new group: <9,23352>:

{ 1:0-14,18-32, 2:0-14,18-31,36, 3-4:0-14,18-32, 5:0-7,9-10,12-14,18-32,38-39, 6-7:0-14,18-32, 8:0-13,17-31,35, 9:0-10,12-14,18-32,36, 10-12,14-18:0-14,18-32, 19:0-3,5-14,18-32,36, down: 13, 20 }

2017-01-07T01:56:48-06:00 <0.5> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_rtxn.c:2020](pid 56980=”kt: rtxn_split”)(tid=103079) gmp split took 25s

2017-01-07T02:00:42-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1734](pid 56826=”kt: gmp-merge”)(tid=100578) group change: <13,23353>

[up: 19 nodes, down: 1 node] (node 13 changed to up)

2017-01-07T02:00:42-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1735](pid 56826=”kt: gmp-merge”)(tid=100578) new group: <13,23353>:

{ 1:0-14,18-32, 2:0-14,18-31,36, 3-4:0-14,18-32, 5:0-7,9-10,12-14,18-32,38-39, 6-7:0-14,18-32, 8:0-13,17-31,35, 9:0-10,12-14,18-32,36, 10-18:0-14,18-32, 19:0-3,5-14,18-32,36, down: 20 }

2017-01-07T02:00:42-06:00 <0.5> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_rtxn.c:1787](pid 56826=”kt: gmp-merge”)(tid=100578) gmp merge took 232s

2017-01-07T02:01:22-06:00 <0.4> ISILON-7(id7) /boot/kernel.amd64/kernel: [gmp_info.c:1734](pid 57751=”kt: gmp-merge”)(tid=102358) group change: <1,23354>

[up: 20 nodes] (node 20 changed to up)

On the nodes that were split from the group, you may notice that the device worker threads are no progressing and below error is seen:

For example:

2017-01-05T01:35:02-06:00 <0.5> ISILON-13(id13) /boot/kernel.amd64/kernel: [rbm_core.c:811](pid 43=”swi4: clock sio”)(tid=100036) ping timeout no dwt progress

2017-01-07T01:54:58-06:00 <0.5> ISILON-13(id13) /boot/kernel.amd64/kernel: [rbm_core.c:811](pid 43=”swi4: clock sio”)(tid=100036) ping timeout no dwt progress

2015-06-08T09:00:25-05:00 <0.5> ISILON-3(id3) /boot/kernel.amd64/kernel: [rbm_core.c:811](pid 42=”swi4: clock sio”)(tid=100035) ping timeout no dwt progress

2015-06-08T09:00:27-05:00 <0.5> ISILON-3(id3) /boot/kernel.amd64/kernel: [rbm_core.c:811](pid 42=”swi4: clock sio”)(tid=100035) ping timeout no dwt progress

2016-12-24T16:20:07-06:00 <0.5> ISILON-9(id9) /boot/kernel.amd64/kernel: [rbm_core.c:811](pid 43=”swi4: clock sio”)(tid=100036) ping timeout no dwt progress

Another behavior you may see are failed attempts to create kernel threads due to too many kernel threads. This will show messages like the following:

/boot/kernel.amd64/kernel: kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: [ktp.c:148](pid 5460=”kt: j_syncer”)(tid=100378) kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: [ktp.c:148](pid 5460=”kt: j_syncer”)(tid=100378) kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: [ktp.c:148](pid 79932=”kt: dxt3″)(tid=107754) [ktp.c:148](pid 80002=”kt: dxt0″)(tid=107871) kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: [ktp.c:148](pid 79812=”kt: dxt2″)(tid=107720) kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: kt_new: backoff 1 sec

/boot/kernel.amd64/kernel: [ktp.c:148](pid 80005=”kt: dxt1″)(tid=107802) kt_new: backoff 1 sec


On the nodes that were split you will see processes running that were created by the cron execution of the isi_phone_home script, used to collect information every certain time based on cron.

For example:

root 228 0.0 0.0 26908 3348 ?? I 11:25AM 0:00.00 cron: running job (cron)

root 230 0.0 0.0 7144 1388 ?? Is 11:25AM 0:00.01 /bin/sh -c isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –script-file cluster_stats.py

root 232 0.0 0.0 100532 13628 ?? I 11:25AM 0:00.11 python /usr/bin/isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –script-file cluster_stats.py

root 234 0.0 0.0 7144 1388 ?? I 11:25AM 0:00.01 sh -c python /usr/local/isi_phone_home/isi_phone_home –script-file cluster_stats.py

root 235 0.0 0.0 127608 17076 ?? I 11:25AM 0:00.16 python /usr/local/isi_phone_home/isi_phone_home –script-file cluster_stats.py

The phone_home processes pile up because they are trying to get an exclusive lock on a file which is currently locked by another phone_home and when phone home runs, it will try to acquire a lock on a file which is in /ifs/.ifsvar/run/isi_phone_home.lock and then populate the isi_phone_home.pid file with its processID and node number.

ls -l /ifs/.ifsvar/run



-rw——- 1 root wheel 0 Jul 7 2016 phone_home.lock

-rw-r–r– 1 root wheel 8 Feb 4 11:45 phone_home.pid

By default the lock is created using blocking flag, if a new instance of phone home attempts to get the lock on that file (which is already locked by the first instance), that second instance will get blocked and stuck until the first instance ends.

A permanent fix is already addressed in Halfpipe 8.0.1.0

A permanent fix for 7.2 branch is targeted to 7.2.1.6

A permanent fix for 8.0.0.x is targeted for 8.0.0.5

To workaround this issue, you should disable the script as shown here, as well as comment out the crontab entries on the cluster. This will only impact the automatic sending of periodic phone home logs via this feature. This will not impact dial home or any other cluster functions.

To disable the phone home script:

# isi_phone_home –d

or

# isi_phone_home -–disable

This places a .disable file in the run directory mentioned earlier and the script will check for this file before performing most of its operations. However, to completely disable the script, comment out these crontab entries in /etc/mcp/templates/crontab on any node in the cluster (the chanes will be propagated to /etc/crontab on each node automatically)

# — Start section of phone home

#min hour mday month wday who command

*/5 * * * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –script-file cluster_stats.py

*/31 * * * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –script-file cluster_full_disk.py

47 11 * * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –script-file nfs_zones.py

13 00 * * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –list-file telemetry.list

0 * * * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –list-file everyhour.list

3 1 */7 * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –delete-data

17 1 */7 * * root isi_ropc -s python /usr/local/isi_phone_home/isi_phone_home –create-package

4 2 */7 * * root isi_ropc -s python /usr/local/isi_phone_home/utils/send_data.py

# — End section of phone home

Note: If there are already nodes in a locked state, it may be necessary to delete the .lock and .pid files and kill the processes locking these files. If this happens, please contact Technical Support for assistance.

Note: In some versions of OneFS 7.2, these cron entries may be in /etc/mcp/override/crontab.smbtime. If this is the case, comment out the entries there.

Related:

OneFS: Process httpd of service isi_webui has failed to restart after multiple attempts

Article Number: 499006 Article Version: 8 Article Type: Break Fix



Isilon OneFS

> The onefs WebUI will be inaccessible if the webui_httpd.conf file under /var/run/apache2/ became stale on the problematic node

isilon-1# ps axuww | grep httpd | grep -v grep

root 43633 0.0 0.5 170644 10188 – Ss 2:49PM 0:00.06 /usr/local/apache2/bin/httpd -k start

daemon 43634 0.0 0.5 170332 9840 – I 2:49PM 0:00.00 /usr/local/apache2/bin/httpd -k start

daemon 43638 0.0 0.6 174360 11652 – I 2:49PM 0:00.00 /usr/local/apache2/bin/httpd -k start

<<< the output is missing “/usr/local/apache2/bin/httpd -f /usr/local/apache2/conf/webui_httpd.conf -k start” thread. There should be 3 or more threads on each node

> Attempting to manually start httpd of isi_webui service would result in the following message:

isilon-1# /usr/local/sbin/webui_httpd_ctl start

httpd: Could not reliably determine the server’s fully qualified domain name, using 27.27.27.102 for ServerName

httpd (pid 72923) already running

isilon-1# cat /var/apache2/run/webui_httpd.pid

72923

If a node is restarted unexpectedly or if the webui process is killed in certain ways, the pid files are left. When webui is restarted either via MCP or via a node reboot, webui may not start correctly, emitting the error:> httpd (pid 72923) already running(where 72923 was the pid of the previously-running webui daemon)

> SSH to the problematic node with “root” account.

> Verify if webui_httpd.pid exists:

# ls -l /var/apache2/run/webui_httpd.pid

– If it does, remove that .pid file with the below command:

# cd /var/apache2/run/; rm -f webui_httpd.pid

Attempt to start httpd again with

# /usr/local/sbin/webui_httpd_ctl start

> After following the steps above, the output should be similar to the below where there should be 3 or more httpd of webui threads.

jv8004-1# ps axuww | grep “webui_httpd.conf” | grep -v grep

root 3135 0.0 0.6 261704 13280 – Ss 4Apr18 0:18.62 /usr/local/apache2/bin/httpd -f /usr/local/apache2/conf/webui_httpd.conf -k start

daemon 3140 0.0 0.6 261668 13268 – I@ 4Apr18 0:00.23 /usr/local/apache2/bin/httpd -f /usr/local/apache2/conf/webui_httpd.conf -k start

daemon 3145 0.0 1.5 307568 30924 – I 4Apr18 0:10.58 /usr/local/apache2/bin/httpd -f /usr/local/apache2/conf/webui_httpd.conf -k start

– Try log in to the web administration on problematic node after the above steps are taken.

Related:

ECS: xDoctor: RAP059: Detected rsyslogd is not running on a node

Article Number: 500620 Article Version: 7 Article Type: Break Fix



ECS Appliance,ECS Appliance Hardware,Elastic Cloud Storage

If this process is not running logging within the docker container will not work correctly.

Timestamp = 2017-12-22_224028 Category = PROCESS Source = PS Severity = ERROR Node = 169.254.1.1 Message = Detected rsyslogd is not running on a node Extra = 169.254.1.1, 169.254.1.2, 169.254.1.3, 169.254.1.4 RAP = RAP059 Solution = 500620 

As you can see below rsyslogd is not running on node 1:

viprexec -i "ps -ef |grep rsyslog |grep -v grep"Output from host : 192.168.219.1root 2440 1 0 Apr07 ? 00:04:31 /usr/sbin/rsyslogd -nOutput from host : 192.168.219.2root 2467 1 0 2016 ? 00:13:46 /usr/sbin/rsyslogd -nroot 109133 109015 0 Feb23 ? 00:10:06 rsyslogdOutput from host : 192.168.219.3root 2560 1 0 2016 ? 00:14:14 /usr/sbin/rsyslogd -nroot 98908 98874 0 Feb23 ? 00:10:20 rsyslogdOutput from host : 192.168.219.4root 2417 1 0 2016 ? 00:13:59 /usr/sbin/rsyslogd -nroot 52765 52641 0 Feb23 ? 00:10:19 rsyslogd 

System is not tolerant to rsyslogd stopping unexpectedly.

Start rsyslogd on all nodes:

# viprexec -c "/usr/sbin/rsyslogd" 

Now verify that process is running again as below

# viprexec "ps -ef |grep rsyslogd |grep -v grep" 

Example output:

ecs-:~ # viprexec 'sudo docker exec -i object-main /usr/sbin/rsyslogd'Output from host : 192.168.219.3Output from host : 192.168.219.4Output from host : 192.168.219.2Already running. If you want to run multiple instances, you need to specify different pid files (use -i option)Output from host : 192.168.219.5Already running. If you want to run multiple instances, you need to specify different pid files (use -i option)Output from host : 192.168.219.1Already running. If you want to run multiple instances, you need to specify different pid files (use -i option)ecs-:~ # viprexec -i "ps -ef |grep rsyslog |grep -v grep"Output from host : 192.168.219.1root 2073 1 0 19:25 ? 00:00:01 /usr/sbin/rsyslogd -nroot 115316 8473 0 22:45 ? 00:00:00 rsyslogdOutput from host : 192.168.219.2root 2083 1 0 20:39 ? 00:00:00 /usr/sbin/rsyslogd -nroot 89721 6664 0 22:45 ? 00:00:00 rsyslogdOutput from host : 192.168.219.3root 2124 1 0 19:53 ? 00:00:00 /usr/sbin/rsyslogd -nroot 43925 6620 0 22:46 ? 00:00:00 /usr/sbin/rsyslogdOutput from host : 192.168.219.4root 2148 1 0 20:13 ? 00:00:00 /usr/sbin/rsyslogd -nroot 115677 6846 0 22:46 ? 00:00:00 /usr/sbin/rsyslogdOutput from host : 192.168.219.5root 1880 1 0 Dec07 ? 00:03:12 /usr/sbin/rsyslogd -nroot 28735 28657 0 Dec12 ? 00:02:24 rsyslogd 

Related:

7017137: How to obtain systemd service core dumps

Temporary configuration:

echo ‘|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e’ > /proc/sys/kernel/core_pattern

or

Persistent configuration:

1. echo ‘kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e’ > /etc/sysctl.d/50-coredump.conf

2. reboot

Run sysctl -a | grep ‘kernel.core_pattern’ to confirm the correct core_pattern command.

UPDATE NOTE: The command “systemd-coredumpctl” has changed in SLES12-SP2 and above. In SLES12-SP2/SP3 the command is “coredumpctl”.

In order to see coredumps, run:

sles12sp1:~ # systemd-coredumpctl list

TIME PID UID GID SIG EXE
Fri 2016-01-08 15:01:10 MST 1088 0 0 6 /usr/sbin/sshd
Fri 2016-01-08 15:34:04 MST 1655 0 0 6 /usr/sbin/sshd
Fri 2016-01-08 15:34:05 MST 21491 0 0 6 /usr/sbin/sshd
Fri 2016-01-08 15:35:27 MST 1361 0 0 6 /usr/sbin/cron
Fri 2016-01-08 15:35:36 MST 21501 0 0 6 /usr/sbin/sshd
Fri 2016-01-08 15:35:39 MST 21530 0 0 6 /usr/sbin/cron

To get the most recent sshd coredump, run:

sles12sp1:~ # systemd-coredumpctl -o core.sshd dump /usr/sbin/sshd

TIME PID UID GID SIG EXE
Fri 2016-01-08 15:35:36 MST 21501 0 0 6 /usr/sbin/sshd
More than one entry matches, ignoring rest.

sles12sp1:~ # ls -l core.sshd
-rw-r--r-- 1 root root 954368 Jan 8 15:49 core.sshd

To get other coredumps, by PID# run:

coredumpctl -o FileName dump PID#

coredumpctl -o core.sshd dump 21491



NOTE:
You must run systemd-coredumpctl dump to extract any core dumps you want out of the systemd journal before rebooting the server. The core dumps stored in the systemd journal will not persist after a server reboot. See systemd-coredumpctl(1) for more options.

Application core can be stored in /var/lib/systemd/coredump depending on the storage setting in /etc/systemd/coredump.conf, see the coredump.conf(5) man page for options.

Run getappcore /path/to/core.sshd to prepare the core dump to upload to SUSE.

sles12sp1:~ # getappcore core.sshd

####################################################################
Get Application Core Tool, v1.28
Date: 01/08/16, 15:50:51
Server: sles12sp1
OS: SUSE Linux Enterprise Server 12 - SP1
Kernel: 3.12.49-11-default (x86_64)
Corefile: core.sshd
####################################################################

Binary file not provided, trying to determine source binary using gdb... Done (/usr/sbin/sshd)
Checking Source Binary with chkbin... Done
Building list of required libraries with gdb... Done
Building list of required RPMs... Done
Building list of debuginfo RPMs...

glibc-2.19-31.9.x86_64.rpm --> glibc-debuginfo-2.19-31.9.x86_64.rpm
krb5-1.12.1-19.1.x86_64.rpm --> krb5-debuginfo-1.12.1-19.1.x86_64.rpm
libaudit1-2.3.6-3.103.x86_64.rpm --> audit-debuginfo-2.3.6-3.103.x86_64.rpm
libcom_err2-1.42.11-7.1.x86_64.rpm --> e2fsprogs-debuginfo-1.42.11-7.1.x86_64.rpm
libkeyutils1-1.5.9-3.29.x86_64.rpm --> keyutils-debuginfo-1.5.9-3.29.x86_64.rpm
libopenssl1_0_0-1.0.1i-34.1.x86_64.rpm --> openssl-debuginfo-1.0.1i-34.1.x86_64.rpm
libpcre1-8.33-3.314.x86_64.rpm --> pcre-debuginfo-8.33-3.314.x86_64.rpm
libselinux1-2.3-2.75.x86_64.rpm --> libselinux-debuginfo-2.3-2.75.x86_64.rpm
libwrap0-7.6-886.3.x86_64.rpm --> tcpd-debuginfo-7.6-886.3.x86_64.rpm
libz1-1.2.8-5.1.x86_64.rpm --> zlib-debuginfo-1.2.8-5.1.x86_64.rpm
openssh-6.6p1-29.1.x86_64.rpm --> openssh-debuginfo-6.6p1-29.1.x86_64.rpm
pam-1.1.8-14.1.x86_64.rpm --> pam-debuginfo-1.1.8-14.1.x86_64.rpm

... Done
Setting gdb environment variables... Done
Creating gdb startup files... Done
Creating core archive... Done
Created archive as: /var/log/nts_sles12sp1_sshd_160108_1550_appcore.tbz
Removing required files and directories ... Done

Finished!

Related:

3078409: Handling ndsd (eDirectory) core files on Linux and Solaris

Sometimes the reason ndsd crashes is due to memory corruption. If this is the case, it is necessary to add variables setting to the ndsd environment to put the memory manager into a debug state. This will help to ensure that ndsd generates a core at the time the corruption occurs so the module that caused the corruption can more easily be identified in the core.

If ndsd cores due to stack corruption, Novell Technical Support will request that you add the appropriate memory manager setting and wait for another core to re-submit.

Linux

To set the necessary memory checking variable on Linux:

Systemd – SLES 12 / Redhat 7 or later: Modify the “env” file located in the /etc/opt/novell/eDirectory/conf directory, then restart the eDirectory instance. ( See 2nd bullet under “Please refer to the following notes:” for details. )

MALLOC_CHECK_=3



SysVinit
– SLES 11 / RedHat 6 or earlier: Modify the pre_ndsd_start script and the following at the very top, then restart the eDirectory instance.

MALLOC_CHECK_=3

export MALLOC_CHECK_

Please refer to the following notes:

  • The contents of the pre_ndsd_start script are sourced into ndsd at the time ndsd loads. Be aware that any permanent settings will be overwritten if left in the ndsd script the next time an eDirectory patch is applied while the pre_ndsd_start script will not be modified. For this reason changes to the ‘ndsd’ script itself should not be made. This is the purpose of the pre/post_ndsd_start scripts.

  • eDirectory on SLES 12 or RHEL 7: You must add all environment variables required for the eDirectory service in the env file located in the /etc/opt/novell/eDirectory/conf directory.

  • MALLOC_CHECK_=3 should NOT be left permanently. Once the cores have been gathered, remove this setting from the modified script and restart ndsd. This environment variable can have a performance impact on some systems due to the increased memory checking. In eDirectory 8.8, it will cause ndsd to revert back to using malloc instead of tcmalloc_miminal which was added to enhance performance.

    Another side effect of using MALLOC_CHECK_=3 is the possibility of increased coring. Malloc will cause ndsd to core whenever a memory violation is detected whether or not it would have caused ndsd to crash under normal running conditions.

    To verify this ndsd environment variable is set properly while ndsd is running, do the following as the user running the eDirectory instance (‘root’ most of the time):

    strings /proc/`pgrep ndsd`/environ | grep -i MALLOC_CHECK_

    The command above will not work on a server with multiple eDirectory instances (or ndsd processes). To check a particular instance find that instance’s process’s PID and use that directly. For PID 12345 the command would be the following:

    strings /proc/12345/environ | grep -i MALLOC_CHECK_

    After ndsd has cored, to verify the core file had the ndsd environment variable set, do the following:

    strings core.#### | grep -i MALLOC_CHECK_

    Bundle the core with MALLOC_CHECK_=3 set as in step 2.

    For more information on Malloc check see: TID 3113982 – Diagnosing Memory Heap Corruption in glibc with MALLOC_CHECK_

  • eDirectory 8.8.5 ftf2 (patch2) the location of the pre_ndsd_start has been moved from /etc/init.d to /opt/novell/eDirectory/sbin/.

Solaris

In current code, eDirectory uses libumem as the memory manager.

To configure libumem for debugging add the following to the pre_ndsd_start script at the top and restart ndsd:

UMEM_DEBUG=default

UMEM_LOGGING=transaction

export UMEM_DEBUG UMEM_LOGGING

Submit a new core with these settings in place.

Changing the location where cores files are generated

In certain situations it may be desirable to change the location where core files are generated. By default ndsd core files are placed in the dib directory. If space in this directory is limited or if another location is desired, the following can be done:

mkdir /tmp/cores

chmod 777 /tmp/cores

echo “/tmp/cores/core”> /proc/sys/kernel/core_pattern

This example would now generate the core. <pid> file in /tmp/cores

To revert back to placing cores in default location:

echo core > /proc/sys/kernel/core_pattern

Symbol build of ndsd libriaries



In some cases, a core file generated while running libraries with symbols included may be necessary to analyze the core.

This is particularly true when analyzing cores generated by the 64 bit version of ndsd since the parameters aren’t located at a specific location.

The symbol versions of the libraries can be obtained from Novell eDirectory backline support.

Related:

7023178: yast2 kdump error when updating grub (only on PowerPC configuring fadump)

This document (7023178) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)

Situation

The issue affects only PowerPC (ppc64le) using ‘fadump=on’.
Using ‘yast2 kdump’ to edit the settings is causing an ‘Internal error’ on ‘OK’ and ‘save’:
YaST2 - kdump @ linux Initializing kdump Configurationx 

Reading the config file... => Reading kernel boot options... - Calculating memory limits...
?????????????????????????????????????????????????????????????????????????????????Error ??Execution of command "[["/usr/sbin/grub2-install", "--target=powerpc-ieee1275"??Exit code: 1 ??Error output: Installing for powerpc-ieee1275 platform. ??/usr/sbin/grub2-install: error: the chosen partition is not a PReP partition. ?? ?? [OK] ?????????????????????????????????????????????????????????????????????????????????

or

?????????????????????????????????????????????????????????????????????????????????Error ??Internal error. Please report a bug report with logs. ??Run save_y2logs to get complete logs. ??Details: Invalid loader device /dev/disk/by-id/dm-uuid-part1-mpath-1IBM_IPR-10??Caller: /usr/share/YaST2/lib/bootloader/mbr_update.rb:200:in `partition_to_ac?? ?? [OK] ?????????????????????????????????????????????????????????????????????????????????

Notice the “Invalid loader device /dev/disk/by-id/dm-uuid-part1-mpath-1IBM_IPR-10_58C7ED0000000040” in the second error output.
The device is read from the /etc/default/grub_installdevice but ‘YaST/libstorage’ doesn’t know about the device.

Resolution

The kdump update, kdump-0.8.16-7.8.1 released July/2018 includes the patch to avoid rewriting /etc/default/grub_installdevice when using ‘yast2 kdump’.
The yast2-bootloader-3.2.27.1-2.6.4 and libstorage-2.26.12.2-3.3.2 update released August/2018 includes the fix correcting how udev ids with dm-uuid-names were handled.
Until the update can be installed, the below workaround, editing the /etc/default/grub_installdevice, can be used.
Workaround:
After installing the kdump update and before using ‘yast2 kdump’, the content of the /etc/default/grub_installdevice needs to be reviewed.
If the /etc/default/grub_installdevice was already overwritten with the incorrect symlink/udev id, that /dev/disk/by-id/<symlink> needs to be changed.
# cat /etc/default/grub_installdevice
If it contains a symlink looking like this:
/dev/disk/by-id/*-partN-mpath-* ,
the symlink needs to be changed, setting -partN- at the end and preferably using the scsi- or wwn- ID, so it looks like this:
/dev/disk/by-id/scsi-*-partN or /dev/disk/by-id/wwn-*-partN.
For example:
# cat /etc/default/grub_installdevice
/dev/disk/by-id/dm-uuid-part1-mpath-1IBM_IPR-10_58C7ED0000000040
activate
This is incorrect and the symlink needs to be changed to either the scsi- or wwn- symlink and look like this:
# cat /etc/default/grub_installdevice
/dev/disk/by-id/scsi-1IBM_IPR-10_58C7ED0000000040-part1
activate
or when using the wwn-<ID>-part1 for that device like this:
# cat /etc/default/grub_installdevice
/dev/disk/by-id/wwn-0xIBM_IPR-10_58C7ED0000000040-part1
activate
The udev IDs are listed in the /dev/disk/by-id directory.
# ls -l /dev/disk/by-id/
will show all names for the mapped device dm-X, (the device grub is installed on). Search for the device’s specific ID already set in the /etc/default/grub_installdevice in the /dev/disk/by-id/ line.
Then by looking at the content of /dev/disk/by-id, the other symlinks for the same LUN can be seen.
Either the scsi- or wwn- symlink for that ID can be selected to update the /etc/default/grub_installdevice /dev/disk/by-id/ with.
The symlink can be changed in the file directly with a text editor or by writing on a terminal to it:
# cat >/etc/default/grub_installdevice <<EOD
> /dev/disk/by-id/scsi-1IBM_IPR-10_58C7ED0000000040-part1
> activate

> EOD

Once the “/dev/disk/by-id/” was changed accordingly, “yast2 kdump” can be used to configure fadump.

Cause

If ‘fadump=on’ is set, kdump calls perl-Bootloader to update and rewrites the configuration files including /etc/default/grub_installdevice.
It then takes the first by-id symlink in unspecified order which can result in a symlink of the unexpected dm-uuid-part1-mpath-* form, which is wrong in this case.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented “AS IS” WITHOUT WARRANTY OF ANY KIND.

Related: