[Comm] кривое ядро АЛМ2.2????

Dmitry Kovalsky =?iso-8859-1?q?dikov_=CE=C1_imbg=2Eorg=2Eua?=
Чт Окт 2 18:41:37 MSD 2003


Привет

есть 2 машины dual AMD 2000+ между ними гигабит
на них считаю паралелльные задачи
раньше все крутилось на Мандраке и решил залить туда АЛМ2.2 


каждый день виснет машина, в логах такое


Oct  2 06:50:00 node1 crond[164]: (dikov) CMD 
(/home/dikov/bin/md_parallel_p9v)
Oct  2 07:00:00 node1 crond[546]: (dikov) CMD 
(/home/dikov/bin/md_parallel_p9v)
Oct  2 07:01:01 node1 crond[655]: (root) CMD (run-parts /etc/cron.hourly)
Oct  2 07:06:38 node1 kernel: Unable to handle kernel paging request at 
virtual address 3747ab64
Oct  2 07:06:38 node1 kernel:  printing eip:
Oct  2 07:06:38 node1 kernel: c01338e5
Oct  2 07:06:38 node1 kernel: *pde = 00000000
Oct  2 07:06:38 node1 kernel: Oops: 0002 2.4.20-alt5-smp #1 SMP Sun Feb 16 
16:07:02 MSK 2003
Oct  2 07:06:38 node1 kernel: CPU:    0
Oct  2 07:06:38 node1 kernel: EIP:    0010:[kmem_cache_alloc_batch+101/208]    
Not tainted
Oct  2 07:06:38 node1 kernel: EIP:    0010:[<c01338e5>]    Not tainted
Oct  2 07:06:38 node1 kernel: EFLAGS: 00010056
Oct  2 07:06:38 node1 kernel: eax: df8d6bd8   ebx: c9b39800   ecx: c547a2c0   
edx: 3747ab60
Oct  2 07:06:38 node1 kernel: esi: df8d6bd0   edi: 00000016   ebp: dffc2c60   
esp: cc271c94
Oct  2 07:06:38 node1 kernel: ds: 0018   es: 0018   ss: 0018
Oct  2 07:06:38 node1 kernel: Process mdrun_mpi (pid: 9934, 
stackpage=cc271000)
Oct  2 07:06:38 node1 lamd[1566]: died: caught child death; trying to detach
Oct  2 07:06:38 node1 lamd[1566]: died: detaching table entry 10
Oct  2 07:06:38 node1 lamd[1566]: died: finished
Oct  2 07:06:38 node1 kernel: Stack: df8d6bd0 00000206 df8d6bd0 00000246 
c0133c33 df8d6bd0 dffc2c60 000001f0
Oct  2 07:06:38 node1 kernel:        cc270000 cde0a060 c5406900 cde0a1a0 
000027ec c42bc8e0 00000206 000001f0
Oct  2 07:06:38 node1 kernel:        0000d8b8 c01b768f 0000071c 000001f0 
c576e6c0 00000000 cde0a1a0 c01d95d4
Oct  2 07:06:38 node1 kernel: Call Trace:    [kmalloc+163/384] 
[alloc_skb+239/448] [tcp_sendmsg+580/4656] [nf_hook_slow+266/384] 
[do_page_fault+0/1307]
Oct  2 07:06:38 node1 kernel: Call Trace:    [<c0133c33>] [<c01b768f>] 
[<c01d95d4>] [<c01c29aa>] [<c0116680>]
Oct  2 07:06:38 node1 kernel:   [error_code+52/64] 
[__generic_copy_to_user+48/64] [memcpy_toiovec+57/96] 
[skb_copy_datagram_iovec+77/592] [kfree+131/144] [inet_sendmsg+53/64]
Oct  2 07:06:38 node1 kernel:   [<c0108da4>] [<c0200b50>] [<c01b9269>] 
[<c01b98ed>] [<c0133e13>] [<c01f73b5>]
Oct  2 07:06:38 node1 kernel:   [sock_sendmsg+108/144] 
[sock_readv_writev+148/160] [sock_writev+59/80] [do_readv_writev+428/736] 
[poll_freewait+68/80] [do_select+566/592]
Oct  2 07:06:38 node1 kernel:   [<c01b414c>] [<c01b4464>] [<c01b44fb>] 
[<c013e02c>] [<c014e134>] [<c014e4f6>]
Oct  2 07:06:38 node1 kernel:   [sys_select+1138/1152] [do_fcntl+379/656] 
[sys_writev+67/96] [system_call+51/64]
Oct  2 07:06:38 node1 kernel:   [<c014e9b2>] [<c014cd8b>] [<c013e203>] 
[<c0108c93>]
Oct  2 07:06:38 node1 kernel: Code: 89 42 04 89 10 c7 01 00 00 00 00 c7 41 04 
00 00 00 00 8b 06
Oct  2 07:06:38 node1 lamd[1566]: kenyad: did not start this process with the 
flatd; detaching now
Oct  2 07:06:38 node1 lamd[1566]: kenyad: in pdetachindex
Oct  2 07:06:38 node1 lamd[1566]: kenyad: detatching, checking for RTF_FLAT 
(0x800) in flags: 0x159691
Oct  2 07:06:38 node1 lamd[1566]: kouter: kqdetach detached process pid=9933
Oct  2 07:06:38 node1 lamd[1566]: kouter: kqdetach calling kio_close
Oct  2 07:06:38 node1 lamd[1566]: kouter: kqdetach calling knuke
Oct  2 07:06:39 node1 lamd[1566]: kouter: kqdetach detached process pid=9932
Oct  2 07:06:39 node1 lamd[1566]: kouter: kqdetach calling kio_close
Oct  2 07:06:39 node1 lamd[1566]: kouter: kqdetach calling knuke
Oct  2 07:10:00 node1 crond[938]: (dikov) CMD 
(/home/dikov/bin/md_parallel_p9v)
Oct  2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1041, pri=1095
Oct  2 07:10:10 node1 lamd[1566]: flatd: flqload - successfully created file 
/tmp/lam-dikov на node1.entry.kiev.ua/lam-flatd2
Oct  2 07:10:10 node1 lamd[1566]: flatd: flqload - file descriptor 13
Oct  2 07:10:10 node1 lamd[1566]: kenyad: pqcreating with rtf 0x159290
Oct  2 07:10:10 node1 lamd[1566]: kenyad: looking for executable 
"/usr/local/bin/mdrun_mpi" in directory "/home/gromacs/dikov/P9V"
Oct  2 07:10:10 node1 lamd[1566]: kenyad: found "/usr/local/bin/mdrun_mpi"
Oct  2 07:10:10 node1 lamd[1566]: kenyad: creating new user process...
Oct  2 07:10:10 node1 lamd[1566]: kenyad: attempting to receive stdout/stderr 
file descriptors
Oct  2 07:10:10 node1 lamd[1566]: kenyad: recv_stdio_fds: happiness
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting environment variables to 
pass to new process
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSFD
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSRTF
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMJOBID
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMKENYAPID
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMWORLD
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMPARENT
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMRANK
Oct  2 07:10:10 node1 lamd[1566]: kenyad: checking for working directory flag
Oct  2 07:10:10 node1 lamd[1566]: kenyad: working directory set explicitly
Oct  2 07:10:10 node1 lamd[1566]: kenyad: running in directory 
/home/gromacs/dikov/P9V
Oct  2 07:10:10 node1 lamd[1566]: kenyad: fork/exec succeeded, pid 1042, index 
10, rtf 0x159292
Oct  2 07:10:10 node1 lamd[1566]: kenyad: create succeeded, process running
Oct  2 07:10:10 node1 lamd[1566]: flatd: flqload - successfully created file 
/tmp/lam-dikov на node1.entry.kiev.ua/lam-flatd3
Oct  2 07:10:10 node1 lamd[1566]: flatd: flqload - file descriptor 13
Oct  2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1042, pri=0
Oct  2 07:10:10 node1 lamd[1566]: kenyad: pqcreating with rtf 0x159290
Oct  2 07:10:10 node1 lamd[1566]: kenyad: looking for executable 
"/usr/local/bin/mdrun_mpi" in directory "/home/gromacs/dikov/P9V"
Oct  2 07:10:10 node1 lamd[1566]: kenyad: found "/usr/local/bin/mdrun_mpi"
Oct  2 07:10:10 node1 lamd[1566]: kenyad: creating new user process...
Oct  2 07:10:10 node1 lamd[1566]: kenyad: attempting to receive stdout/stderr 
file descriptors
Oct  2 07:10:10 node1 lamd[1566]: kenyad: recv_stdio_fds: happiness
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting environment variables to 
pass to new process
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSFD
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSRTF
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMJOBID
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMKENYAPID
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMWORLD
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMPARENT
Oct  2 07:10:10 node1 lamd[1566]: kenyad: setting LAMRANK
Oct  2 07:10:10 node1 lamd[1566]: kenyad: checking for working directory flag
Oct  2 07:10:10 node1 lamd[1566]: kenyad: working directory set explicitly
Oct  2 07:10:10 node1 lamd[1566]: kenyad: running in directory 
/home/gromacs/dikov/P9V
Oct  2 07:10:10 node1 lamd[1566]: kenyad: fork/exec succeeded, pid 1043, index 
11, rtf 0x159292
Oct  2 07:10:10 node1 lamd[1566]: kenyad: create succeeded, process running
Oct  2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1043, pri=0
Oct  2 07:10:14 node1 pam_tcb[1044]: sshd: Authentication failed for dikov 
from (uid=0)
Oct  2 07:10:14 node1 sshd[1049]: input_userauth_request: illegal user 
sysadmin
Oct  2 07:10:14 node1 sshd[1049]: Failed none for UNKNOWN USER from 10.1.1.1 
port 49371 ssh2
Oct  2 07:10:14 node1 sshd[1049]: Failed password for UNKNOWN USER from 
10.1.1.1 port 49371 ssh2
Oct  2 07:10:14 node1 sshd[1049]: Connection closed by 10.1.1.1
Oct  2 07:10:14 node1 pam_tcb[1051]: sshd: Authentication failed for dikov 
from (uid=0)
Oct  2 07:10:14 node1 sshd[1056]: input_userauth_request: illegal user 
sysadmin
Oct  2 07:10:14 node1 sshd[1056]: Failed none for UNKNOWN USER from 10.1.1.1 
port 50883 ssh2
Oct  2 07:10:14 node1 sshd[1056]: Failed password for UNKNOWN USER from 
10.1.1.1 port 50883 ssh2
Oct  2 07:10:14 node1 sshd[1056]: Connection closed by 10.1.1.1
Oct  2 07:10:14 node1 pam_tcb[1057]: sshd: Authentication failed for dikov 
from (uid=0)
Oct  2 07:10:14 node1 pam_tcb[1061]: sshd: Authentication failed for root from 
(uid=0)
Oct  2 07:10:14 node1 pam_tcb[1065]: sshd: Authentication failed for dikov 
from (uid=0)
Oct  2 07:10:14 node1 pam_tcb[1069]: sshd: Authentication failed for root from 
(uid=0)
Oct  2 07:10:14 node1 sshd[1075]: input_userauth_request: illegal user 
sysadmin
Oct  2 07:10:14 node1 sshd[1075]: Failed none for UNKNOWN USER from 10.1.1.1 
port 57066 ssh2
Oct  2 07:10:14 node1 sshd[1075]: Failed password for UNKNOWN USER from 
10.1.1.1 port 57066 ssh2
Oct  2 07:10:14 node1 last message repeated 2 times

Насколько я понимаю это проблемы в самом ядре, я прав?

Дима

-- 
Sincerely yours,

Ph.D. Student Dmytro Kovalskyy
Institute of Molecular Biology & Genetics
150 Akad. Zabolotnogo Street,
Kiev-143, 03143
UKRAINE

E-mail: dikov на imbg.org.ua
Fax:  +380 (44) 266-0759
Tel.: +380 (44) 266-5589






Подробная информация о списке рассылки community