Tuesday, March 31, 2009

Solaris/Linux: find port number for a program and vice-versa


#== Find port number for a program
- lsof tool(Platform independent)

$lsof -nc | sshd grep TCP

sshd 1962 root 3u IPv6 6137 TCP *:ssh (LISTEN)
sshd 2104 root 3u IPv6 7425 TCP 172.16.31.3:ssh->172.16.31.2:cs-services (ESTABLISHED
- Linux
$netstat -anp |grep sshd

tcp 0 0 :::22 :::* LISTEN 1962/sshd
- Solaris
$ pfiles 16976
...
sockname: AF_INET 172.18.126.148 port: 22
..


#==Find program name for port number
- lsof tool(Platform independent)
$lsof -i TCP:22
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
sshd 1962 root 3u IPv6 6137 TCP *:ssh (LISTEN)
sshd 2104 root 3u IPv6 7425 TCP 172.16.31.3:ssh->172.16.31.2:cs-services (ESTABLISHED)
- Linux
$netstat -anp grep 22
tcp 0 0 :::22 :::* LISTEN 1962/sshd
- Solaris
list open files for all process,then search the file for "port: 22"

$ ps -e -o pid | xargs pfiles > /tmp/pfiles.log 



RHCE NOTES - SElinux

Quiick SElinux notes for the impatient, read full document at

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/selinux-guide/


Selinux has 2 levels access control:
1) File context, Daemon can only access file with particular file context
2) Boolean Value: enable/disalbe a feature
for example: By default SElinux does not allow users to login and read their home directories, turn it on by "setsebool -P ftp_home_dir 1"

#==Confined and Unconfined Process
Confined process enter paritcular domain after started, only particular domain has access to particular TYPE files
SElinux has no effect for Unconfined Processes (apps doen's support SElinux)

==Example
$ ls -Z /usr/sbin/httpd
-rwxr-xr-x root root system_u:object_r:httpd_exec_t /usr/sbin/httpd #httpd is confined by default
$chcon -Rt unconfined_exec_t /usr/sbin/httpd #change httpd to unconfied_exec_t, it will enter unconfied domain, so it can access any file as long as OS level file permission allowed
$ restorecon -Rv /usr/sbin/httpd #restore default type

#== SELinux: File context
user:role:type:sensitivity:category
for example: system_u:object_r:httpd_sys_content_t :s0:c0
Not all systems will display s0:c0

==example
# ls -aZ /var/www/html/
drwxr-xr-x root root system_u:object_r:httpd_sys_content_t .
drwxr-xr-x root root system_u:object_r:httpd_sys_content_t ..
# ls -aZd /home
drwxr-xr-x root root system_u:object_r:home_root_t /home
httpd_exec_t can access httpd_sys_content_t not home_root_t

#==SElinux managment
SELINUX=permissive #in /etc/selinux/config. if it changed from disabled . it needs reboot to lable files
getenforce or sestatus #get current status
setenforce 0 # set to permissive mode
setenforce 1 #set to enforce mode
getsebool -a #list booleans and its value , no desc
setsebool httpd_can_network_connect_db on #change current boolean
setsebool -P httpd_can_network_connect_db on #change permanent boolean with -P

- Temparary change context
chcon -R -t httpd_sys_content_t /web/ #change context type dir/file
# it will survive reboot, but not relabel. To relabel, touch /.autorelabel reboot

- Persistent Changes: semanage fcontext
/etc/selinux/targeted/contexts/files/file_contexts #saved to orginal context
/etc/selinux/targeted/contexts/files/ file_contexts.local #saved to new user context
semanage fcontext -a -t samba_share_t /etc/file1 #-a add new context, the file doesn't need to exist.
restorecon -Rv /etc/file1 #read the new customized context and apply it

- Restore default context
semanage fcontext -d /etc/file1 #remove context,the file doesn't need to exist
restorecon -RFv /etc/file1 #apply the change, -F is needed you to restore from customized to default.

#==Troubleshooting
/var/log/messages.X
/var/log/audit/audit.log #enable auditd daemon first
chkconfig --levels 345 setroubleshoot on #enable troubleshoot daemon
sealert -a /var/log/messages #analyse log
sealert -l \* #show all alert
grep "SELinux is preventing" /var/log/messages
grep "denied" /var/log/audit/audit.log
Port Numbers # services are allowed to run on some defined ports
/usr/sbin/semanage port -l grep http_port_t
ttp_port_t tcp 80, 443, 488, 8008, 8009, 8443
semanage port -a -t http_port_t -p tcp 9876 #add the new port to allowed range

#==== document
selinux-policy-2.4.6-137.el5#man pages for ftpd_selinux, samba_selinux ...etc

Friday, March 27, 2009

OpenNMS monitor disk space usage by SNMP

#==Overview
This post demonstrates two ways to monitor disk space usage by 2 different SNMP MIBS
1) .iso.org.dod.internet.private.enterprises.ucdavis.dskTable
2) .iso.org.dod.internet.mgmt.mib-2.host.hrStorage


What is the diffrence? Option #1 requires disk path to be hardcoded in snmpd.conf at target system, But Option #2 can monitor all partions by default, even there is need to monitor specific partions, the filter is set on OpenNMS, not target system, So Option #2 is more flexible

The alarm is triggered by threshold in SNMP, so you don't need to setup monitors to trigger alarm. Firstly, Please make sure you have basic snmp working, refer to my post Set up Net-snmp on CentOS

#==ENV
OpenNMS 1.6.2 + Centos 5.2 +net-snmp 5.3.1


#==(1) Monitor disk space usage by dskTable MIB
add the parttiton to be monitored to snmpd.conf

$vi /etc/snmp/snmpd.conf
disk /opt2

#====Test by snmpwalk first


$snmpwalk -v2c 172.16.31.3 -c public .iso.org.dod.internet.private.enterprises.ucdavis.dskTable
UCD-SNMP-MIB::dskPath.1 = STRING: /opt2 

OpenNMS 1.6.2 has set default threshold, so you don't need to config any thing in openNMS. doublecheck the threshold by
GUI->Admin->Manage Thresholds->netsnmp

#====Sample alarm appeared
High threshold exceeded for SNMP datasource ns-dskPercent on interface 172.16.31.3, parms: ds="ns-dskPercent" value="100.0" threshold="90.0" trigger="2" rearm="75.0" label="/opt2" ifIndex="2

#== (2) Monitor disk space usage by hrStorage MIB
no need to add disk path to snmpd.conf, but openNMS needs to be customized.

#===Test by snmpwalk first


$snmpwalk -v2c 172.16.31.3 -c public .iso.org.dod.internet.mgmt.mib-2.host.hrStorage


#==== Find systemOID
This OID is the same to all Net-SNMP agent


$snmptranslate .iso.org.dod.internet.private.enterprises.netSnmp.netSnmpEnumerations.netSnmpAgentOIDs -On

.1.3.6.1.4.1.8072.3.2

#==== Include the SysOID to ./etc/datacollection-config.xml
By default, mib2-host-resources-storage is not included for Net-SNMP
The value in sysoidMask should include your systemOID
For example sysoidMask .1.3.6.1.4.1.8072.3. includes .1.3.6.1.4.1.8072.3.2


systemDef name="Net-SNMP"
sysoidMask.1.3.6.1.4.1.8072.3./sysoidMask
collect
...
includeGroup mib2-host-resources-storage/includeGroup

#====Include your sysOID to ./etc/threshd-configuration.xml



package name="hrstorage"
filterIPADDR != '0.0.0.0' & (nodeSysOID LIKE '.1.3.6.1.4.1.8072.3.2.%' nodeSysOID LIKE '.1.3.6.1.4.1.311.%' nodeSysOID LIKE '.1.3.6.1.4.1.2.3.1.2.1.1.3.%')/filter
..
/package


OpenNMS 1.6.2 has set default threshold, doublecheck the threshold by
GUI->Admin->Manage Thresholds->hrStorage



By default, it monitors all partions. You can create filter to monitor specific partions only

#===Sample alarm appeared
High threshold exceeded for SNMP datasource hrStorageUsed / hrStorageSize * 100.0 on interface 172.16.31.3, parms: ds="hrStorageUsed / hrStorageSize * 100.0" value="94.88754412506304" threshold="90.0" trigger="2" rearm="75.0" label="/opt2" ifIndex="2"

Wednesday, March 25, 2009

Setup net-snmp on Linux (CentOS 5.2)

The default configuration on net-snmp is very secure, it allows public access to system OID only, If you try access any other OID, it give erorr:No Such Object available on this agent at this OID. This article show how to setup a basic net-snmp with access control ability.

$snmpwalk -v 2c localhost -c public system
SNMPv2-MIB::sysDescr.0 = STRING: Linux centos-ks 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686

$ snmpwalk -v 2c localhost -c public interfaces
IF-MIB::interfaces = No Such Object available on this agent at this OID



#==Env
NET-SNMP version 5.3.1 Centos 5.2

#=== sample /etc/snmpd/snmpd.conf
- It is important to comment out any default statement above, Because access decision is based on first match.


## sec.name source community
com2sec mynetwork 127.0.0.1 public
com2sec mynetwork 172.16.31.0/24 public

## group.name sec.model sec.name
group MyROGroup v1 mynetwork
group MyROGroup v2c mynetwork

## incl/excl subtree mask
view all included .1

## context sec.model sec.level prefix read write notif
access MyROGroup "" any noauth exact all none none

**Updated:  28 March 2011

The above statements can be simplified as:
rocommunity  public  127.0.0.1  .1
rocommunity  public  172.16.31.0/24  .1

NOTE:rocommunity can't restrict SNMP version, it allows  all versions:v1 and v2c

#== Troubleshooting
- Snmpd still starts despite syntax error, it make troubleshooting difficult, But if you start it with DEBUG it will warn you any errors
/usr/sbin/snmpd -LE 7 -p /var/run/snmpd.pid -a


- By default, SNMPD looks for modules in /usr/share/snmp/mibs, The following command will check the loaded module
snmpd -Dmib_init

- If you don't know the OID of an object, snmptranslate can help, The following demostrate how to find objectname and its OID

$ snmptranslate -Ts  grep interface
.iso.org.dod.internet.mgmt.mib-2.interfaces
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifNumber
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifIndex
.iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr


$ snmpget -v 1 localhost -c public interfaces.ifTable.ifEntry.ifDescr.2
IF-MIB::ifDescr.2 = STRING: eth0

$ snmptranslate .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr.2 -On
.1.3.6.1.2.1.2.2.1.2.2

$ snmpget -v 1 localhost -c public .1.3.6.1.2.1.2.2.1.2.2
IF-MIB::ifDescr.2 = STRING: eth0

Tuesday, March 24, 2009

How to display content of files along with file names?


Sometimes it is useful to display content of files along with file names. egrep or pr can do the trick


$ cat 1.txt
0
1

$ cat 2.txt
2
3

#==cat can't display file name

$ cat *.txt
0
1
2
3

#==display all with wildcard filter *

$ egrep \* *.txt
1.txt:0
1.txt:1
2.txt:2
2.txt:3

#==The header of pr displays filename, sed is used to chop blank lines

$ pr *.txt sed '/^$/d'
2009-03-25 03:17 1.txt Page 1
0
1
2009-03-25 03:17 2.txt Page 1
2
3

When some processes stop Solaris Zone from being shutdown

If Solaris Zone takes long time to shutdown, you may need to examine the process with '*' on the state status.

$ svcs -a grep sendmail
*online Mar_09 svc:/network/smtp:sendmail


#==find the process id of the offending process
$svcs -p sendmail
STATE STIME FMRI
*online Mar_09 svc:/network/smtp:sendmail
Mar_09 309 sendmail
Mar_09 310 sendmail


#==Then kill with kill cmd



#==If it happens quite offen, You may find the following script handy.


#!/bin/ksh
SVCNAMES=$@
CNT=0
DELAY=5
#
getpid () {

SVC=$1
/usr/bin/svcs $SVC >/dev/null
if [ $? -ne 0 ];then
PID=0
return 1
fi
PID=`/usr/bin/svcs -Hp $SVC|tail +2 | awk '{print $2}'| tail -1`
if [ -z "$PID" ];then
PID=1
fi
return 0
}

[ -z $SVCNAMES ] && echo "Usage $0 svcname1 [svcname2] .."

for SVCNAME in $SVCNAMES
do

getpid $SVCNAME

if [ $PID -lt 1 ];then
echo "No pid found for $SVCNAME"
exit 1
fi

while [ $PID -gt 1 ]
do
echo "Delay for " $DELAY " secs"
sleep $DELAY;
getpid $SVCNAME
CNT=`expr $CNT + 1 `
if [ $CNT -le 7 ] && [ $PID -gt 1 ];then
echo "Service $SVCNAME is still running after " `expr $CNT \* $DELAY ` "secs, Gracefully kill it: kill $PID"
kill $PID
elif [ $CNT -gt 7 ] && [ $PID -gt 1 ];then
echo "Service $SVCNAME is still running after " `expr $CNT \* $DELAY ` "secs, Forcefully kill it: kill -9 $PID"
kill -9 $PID

fi
done
sleep 2;
/usr/bin/svcs $SVCNAME | grep disabled

if [ $? -eq 0 ]; then
echo "Service is stopped"
else
echo "Service is till running, please kill it mannually"
exit 1
fi

done

Friday, March 20, 2009

Integrating Nagios plugin with OpenNMS


OpenNMS is highly scalable enterprise level management system. I like its features of versatile built-in monitors, auto-discovery and graphing ability. It can also work with ngaios plugin to use any customized monitor.

Install OpenNMS

http://www.opennms.org/documentation/InstallStable.html

Setup NRPE on client

yum install nrpe nagios-plugins-nrpe nagios-plugins

#==create a test script
$vi /usr/lib/nagios/plugins/check_test.sh

#!/bin/sh
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
echo "check test"
exit $STATE_OK


$ make sure nagios user has rx permission for the script.
chmod +rx check_test.sh


#vi /etc/nagios/nrpe.cfg
allowed_hosts=127.0.0.1,IP of OpenNMS
command[check_test]=/usr/lib/nagios/plugins/check_test.sh

#==start nrpe
service nrpe start

#==Now test it mannually, It is important to run the check as user nagios not root

sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -n -H localhost -c check_test
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -H localhost -c check_test

-n = Do no use SSL,if nrpe doesn't support both mode, it is important to set usessl value in opennms config file.

Setup NRPE on OpenNMS system:

yum install nagios-plugins-nrpe nagios-plugins
#==no configuration needed here, first run a mannual test
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -n -H remote-host -c check_test
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -H remote-host -c check_test
-n = Do no use SSL. if nrpe doesn't support both mode, it is important to set usessl value in opennms config file.


OpenNMS configuration

Two configuartion files need to be modified for new added service.
/opt/opennms/etc/capsd-configuration.xml /* service definition for initial scan */
/opt/opennms/etc/ poller-configuration.xml /* service definition for constant polling */


/opt/opennms/etc/capsd-configuration.xml
protocol-plugin
protocol-plugin protocol="NRPE-test" class-name="org.opennms.netmgt.capsd.plugins.NrpePlugin" scan="on"
property key="banner" value="*"
property key="port" value="5666"
property key="timeout" value="3000"
property key="retry" value="2"
property key="usessl" value="true"
property key="command" value="check_test"
protocol-plugin


- Important Notes:
Set usessl value depending on your nrpe ssl supporting ability
command used for polling, it better to be set to your customized script(or system built-in cmd: _NRPE_CHECK)

/opt/opennms/etc/ poller-configuration.xml

service name="NRPE-test" interval="300000" user-defined="true" status="on"
parameter key="retry" value="3"
parameter key="timeout" value="3000"
parameter key="port" value="5666"
parameter key="command" value="check_test"
parameter key="usessl" value="true"
parameter key="padding" value="2"
parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"
parameter key="ds-name" value="nrpe-test"
service

- Important Notes:

The name attribute of the service in poller-configuration.xml needs to match the protocol attribute of the protocol-plugin in capsd-configuration.xml.
The ds-name attribute also needs to be unique for each service, or you'll find response time from one service overwriting response time from another.
You'll also need a line to map the new service to a monitor class (see at the end of the file)

monitor service="NRPE-test" class-name="org.opennms.netmgt.poller.monitors.NrpeMonitor"
Restart OpenNMS for the changes to take effect
The new service should be discovered by re-scan.

Troubleshooting
if the new added service can't be discovered, turn on debug on for discovery process capsd

/opt/opennms/etc/log4j.properties

# Capsd
log4j.category.OpenNMS.Capsd=DEBUG, CAPSD
log4j.appender.CAPSD.File=/opt/opennms/logs/daemon/capsd.log


if the new added service can be discovered, but having issue with polling,turn on debug on for polling process poller
/opt/opennms/etc/log4j.properties

# Pollers
log4j.category.OpenNMS.Poller=DEBUG, POLLERS

Wednesday, March 18, 2009

RHCE TIPS - Sitting the test



SECTION I: TROUBLESHOOTING AND SYSTEM MAINTENANCE [Morning Session]
This section is easy and the proctor will let you know the result immediately.

- Compulsory Section I: It tests system maintenance, It counts for 80, which is enough for RHCE. So if you have got 80 here, you don’t need to take next non-compulsory question.
- Non-compulsory Section I: It tests system booting issue, It counts for 20. If you didn’t complete compulsory section or just after perfect sore 100, take the non-compulsory question. The proctor will re-image your PC to introduce the booting issue, So you can’t go back, once you have made the decision. You should be safe , once you have mastered all scenarios in
my previous post.

SECTION II: INSTALLATION AND CONFIGURATION [Afternoon Session]
If you have breezed through SECTION I, Don’t be too joyful, the hardest part is here.
It is hard because the time is limited, there are many tasks to complete, if you stuck with one, time is quickly running out. Secondly, no one will verify the result, you have to check by yourself. It is quite tricky, if you misinterpret the requirement, your check method maybe wrong, or service lost function after reboot.

A few tips during the test:
- TIME:
Manage well the time, don’t stuck with one question too long, you don’t need full score to pass RHCE.
- CHECK:

Check the result immediately after complete a task. Don’t expect to check everything at last, you can login to remote Linux to verify your result.

#User and file permission:
‘su – username’ to check permissions creating/listing file
#HTTP:
curl http://ServerName-or-IPAddress or elinks http://server
#Squid
curl –x proxyip:port url
#Send mail:
echo "test" mail –s "subject" user
# Send mail with specific sender:
telnet server 25 \n; mail from: user@x \n; rcpt to: user@y \n;data \n; subject: "subject x" \n; "text body "\n; .
#Receive local email:
mail or mail –f mailbox-filename
#Receive remote email:
mutt –f pop://server or mutt –f imap://server
#samba
smbclient //ip/share -U username /* Because you may not able to access the share ,even smbclient -L IP –U show the share*/
#test firewall
nc -z IP 1-200 /*scan remote hosts opening ports */
nc -v IP 25 or telnet IP 25 /* check availability of 1 port */

- Security tasks:
RHCE tasks are about restricting access for services, There are many options to achieve the result, It is up to you which one to use. Service’s native support, PAM, tcp-wrapper, iptables. Be careful using iptables, you should use open firewall which means accepting everything except specifically denied. You don’t want your firewall deny the services completed earlier.

- Need help?
Unfortunately, you don’t have internet access during test. You can only rely on the local man pages and documents. So during the preparation of the test, avoid finding the answers from internet straight away, try the local man pages and docs first. For example if you forget the format for ifcfg-ethx.cfg, the syntax is documented here:
/usr/share/doc/initscripts-XXX-/sysconfig.txt

- Lastly:
In the last mins, you should reboot your PC, check the services are still running after reboot. As pre-caution, always begin with task with this command chkconfig svcname on




Authenticate Linux Clients with Active Directory

Great ariticle exlplaining Authenticate Linux Clients with Active Directory using three Authentication Strategies

Using LDAP Authentication
Using LDAP and Kerberos
Using Winbind

http://technet.microsoft.com/en-us/magazine/2008.12.linux.aspx

Tuesday, March 17, 2009

Learned one critical rule of Openldap's slapd.conf format

One critical rule of Openldap's slapd.conf format : no leading space.

Openldap is easy to config, you just need to customize three params suffix,rootdn and rootpw
# /etc/openldap/slapd.conf

database bdb
suffix "dc=example,dc=com"
rootdn "cn=root,dc=example,dc=com"
rootpw {SSHA}Ok/uoTJYELAj346giEh2mdvmiE5etgcg
The above is my initial config, the rootpw is generated by slappasswd

# slappasswd  -s pass123
{SSHA}sKFAA5OKE6oi+XCXQAJDj/69+g/K9irH


I started ldap service, it was fine,But when i do do ldapsearch it get "ldap_bind: Invalid credentials (49)" error

# ldapsearch -x -h 127.0.0.1 -D "cn=root,dc=example,dc=com" -w pass123
ldap_bind: Invalid credentials (49)


The rootdn and rootpw are definately correct, but why? Did you notice the space before rootpw? it is the culprit. The same search returned ok after deleted the leading space.

Another common error "ldap_sasl_interactive_bind_s: No such attribute (16)" will appear if you omit -x :simple authentication

# ldapsearch   -h 127.0.0.1 -D "cn=root,dc=example,dc=com" -w pass123
ldap_sasl_interactive_bind_s: No such attribute (16)


Openldap tested is slapd 2.3.27

Sunday, March 15, 2009

RHCE TIPS - Preparation


    Reference book:
    RHCE Red Hat Certified Engineer Linux Study Guide (Exam RH302) 5th edition by Michael Jang.
    if something is not clear in the book, read official Red Hat Enterprise Linux Documentation



    Lab Setup:
    Install CentOS on Virutalbox

    Virtualbox is free opensource virtualization software alternative to Vmware.You need 2 CentOS instances to prepare for RHCE lab, The networking in Virtualbox is very different to Vmware.


    -Virtualbox Networking Type:
    --NAT: your guest OS can access outside network through NAT provided by virtualbox, but your host OS can’t access guest OS
    --Host interface networking: Host and guest can communicate each other, but guest can’t access outside network unless you setup NAT manually on Host OS
    --Internal network: Guest OS can communicate with each other within the SAME network name (something ike VLAN ID), but not Host OS.


    -Centos ServerA network setup
    1*NAT adapter for internet access to do yum.
    1*Host network adapter for your host to ssh to ServerA
    1*Internal Network adapter to communicate with ServerB


    -Centos ServerB network setup
    1* Internal Network adapter to communicate with ServerA (join the SAME network name of ServerA )


    How can ServerB access outside network? Point the default GW to serverA, and turn on ip forwarding on ServerA.
    How can my Host OS access ServerB?
    1. ssh to serverA first then jump from serverA to ServerB
    2. -setup porforwarding or 1 to 1 static mapping in ServerA
    --Forwarding port 200 to ssh of ServerB

    iptables -t nat -A PREROUTING -p tcp -d ServerA-Host-NIC-IP --dport 200 -j DNAT --to-destination ServerB-IP:22 
    --Static 1 to 1 mapping
    Assign secondary ip to serverA’s host Inc then
    iptables -t nat -A PREROUTING -p tcp -d ServerA-SEC-NIC-IP -j DNAT --to-destination ServerB-IP

    Last but least, read through each chapter and practice it LAB, you never know if it works until you really do it! RHCE exam is all about security, hence I suggest jumping to security chapter before reading networking services. Then apply your security knowledge (pam/tcp-wrapper/iptables/selinux) to each network services read later.

    Saturday, March 14, 2009

    RHCE Notes - Troubleshooting booting issue

    booting issue is optional question in section I,The proctor will re-image your PC to introduce booting issue, You will be given rescue CD to fix it.

    It is easy to troubleshoot Linux boot issue, if you break it intentionally at each step, observe the symptom and find the fix.

    #==Linux boot order
    The BIOS ->MBR->Boot Loader->Kernel->/sbin/init->
    /etc/inittab->
    /etc/rc.d/rc.sysinit->
    /etc/rc.d/rcX.d/ #where X is run level in /etc/inittab
    run script with K then script with S

    #==Linux rescue env
    boot first linux cd then type linux rescue
    TIP:
    linux rescue will try to mount all partions, however if there is error only some partions are mounted, run choot /mnt/sysimage now will lost /dev /proc mounts, here is how to transfer these mounts.
    mount -o bind /dev /mnt/sysimage/dev
    mount -o bind /proc /mnt/sysimage/proc

    Linux rescue env supports both software RAID and LVM. normal LVM commands e.g vgdisplay are not availiable,but it can be accssed by LVM "master" command e.g "lvm vgdisplay"

    #== Grub boot manager
    = go to grub cmd prompt by pressing c at boot menu
    =find root partition, 2 methords
    grub> root
    (hd0,0) Filesystem type is ext2fs, partition type 0x83
    grub> find /grub/stage1
    (hd0,0)
    =list files/dirs in current drive
    cat / #type cat SPACE / TAB, it will list all fies/dir just like ls
    = display contents of the file
    cat /grub/grub.conf
    = now you can boot interactively by type kernel and initrd commands from grub.conf


    #==Restore missed file from RPM
    #cd /tmp
    #rpm2cpio initscripts-7.93.11.EL-1.i386.rpm cpio -icumvd ./etc/inittab
    or
    #rpm2cpio initscripts-7.93.11.EL-1.i386.rpm >init.cpio /* file is ./etc/inittab not /etc/initab

    List contents: cpio -tv
    or

    install file to alternative location the copy the file
    rpm --root-directory /var/tmp/a X.rpm

    #== MBR corrupted.
    MBR has 512 byte in total
    446 Executable code section
    4 Optional Disk signature
    2 Usually nulls
    64 Partition table #if this is overwritten, no way to recover unless you backuped the partion table or re-partion using #exact same layout
    2 MBR signature

    Corrupt MBR intentionaly:dd if=/dev/zero of=/dev/hda bs=446 count=1 #MBR should be at the start whole disk(not partition hda1), it has 512, the first 446 byte is exec code. DON'T overwrite whole 512 byte because it has partion table data.
    ERR: no bootable media found,Missing operating system" or "Operating System Not Found
    boot from cd run "linux rescue", let it mount linux partions automaticlly.
    chroot /mnt/sysimage then grub-install /dev/hda
    boot from cd run "linux rescue", if linux partions failed to mout
    mount mannually. sfdisk -l; e2label find the boot partition
    mkdir /a; mount /dev/hda1 /a; ln -s /usr/sbin/grub /sbin/grub; grub-install --root-direcotry=/a /dev/hda #it is hda not hda1



    #= root (/)was not mounted
    mount couldn't find file system /dev/root
    switchroot mount faild...
    Error 2 mounting none;exec of init ((null)) failed!!!
    kernel /vmlinuz-test ro root=LABEL=/
    /* root=LABEL=/ mout using label, or root=/dev/sda3 mount with direct dev-name */


    #= not loading initrd image
    VFS: Cannot open root device "Label=/1" or unknow-block(0,0)
    Please append a correct "root=" boot option
    Kernel panic: VFS: Unable to mount roof fs on unknow-block(0,0)
    1) Kernel doesn't Support for the file system .compile kernel with FS support NOT as a module
    2) initrd was not loaded. Add initrd=... in grub.conf
    linux rescue, then chroot /mnt/syimage and create initrd file
    mkinitrd /boot/initrd-filename `uname -r` #make initrd file mannually


    #==/sbin/init problem.
    Switching to new root
    kernel panic -not syncing :Attepmted to kill init
    switching to new root
    /bin/sh: ro : no such file or directory
    /* boot to rescue, check /sbin/init. restore from rpm package*/



    #== /etc/inittab not found
    "enter run level" prompt enter s. or at grub menu append s or init=/bin/sh or emergency, then restore initab from source RPM


    Passed RHCE

    I passed RHCE today, I will be writting some tips and notes.

    Here is my score report.

    SECTION I: TROUBLESHOOTING AND SYSTEM MAINTENANCE
    RHCE requirements: completion of compulsory items (50 points)
    overall section score of 80 or higher
    RHCT requirements: completion of compulsory items (50 points)

    Compulsory Section I score: 50.0
    Non-compulsory Section I score: 50.0
    Overall Section I score: 100

    SECTION II: INSTALLATION AND CONFIGURATION
    RHCE requirements: score of 70 or higher on RHCT components (100 points)
    score of 70 or higher on RHCE components (100 points)

    RHCT requirement: score of 70 or higher on RHCT components (100 points)

    RHCT components score: 92.6
    RHCE components score: 86.7

    RHCE Certification: PASS