[Comm] кривое ядро АЛМ2.2????
Dmitry Kovalsky
=?iso-8859-1?q?dikov_=CE=C1_imbg=2Eorg=2Eua?=
Чт Окт 2 18:41:37 MSD 2003
Привет
есть 2 машины dual AMD 2000+ между ними гигабит
на них считаю паралелльные задачи
раньше все крутилось на Мандраке и решил залить туда АЛМ2.2
каждый день виснет машина, в логах такое
Oct 2 06:50:00 node1 crond[164]: (dikov) CMD
(/home/dikov/bin/md_parallel_p9v)
Oct 2 07:00:00 node1 crond[546]: (dikov) CMD
(/home/dikov/bin/md_parallel_p9v)
Oct 2 07:01:01 node1 crond[655]: (root) CMD (run-parts /etc/cron.hourly)
Oct 2 07:06:38 node1 kernel: Unable to handle kernel paging request at
virtual address 3747ab64
Oct 2 07:06:38 node1 kernel: printing eip:
Oct 2 07:06:38 node1 kernel: c01338e5
Oct 2 07:06:38 node1 kernel: *pde = 00000000
Oct 2 07:06:38 node1 kernel: Oops: 0002 2.4.20-alt5-smp #1 SMP Sun Feb 16
16:07:02 MSK 2003
Oct 2 07:06:38 node1 kernel: CPU: 0
Oct 2 07:06:38 node1 kernel: EIP: 0010:[kmem_cache_alloc_batch+101/208]
Not tainted
Oct 2 07:06:38 node1 kernel: EIP: 0010:[<c01338e5>] Not tainted
Oct 2 07:06:38 node1 kernel: EFLAGS: 00010056
Oct 2 07:06:38 node1 kernel: eax: df8d6bd8 ebx: c9b39800 ecx: c547a2c0
edx: 3747ab60
Oct 2 07:06:38 node1 kernel: esi: df8d6bd0 edi: 00000016 ebp: dffc2c60
esp: cc271c94
Oct 2 07:06:38 node1 kernel: ds: 0018 es: 0018 ss: 0018
Oct 2 07:06:38 node1 kernel: Process mdrun_mpi (pid: 9934,
stackpage=cc271000)
Oct 2 07:06:38 node1 lamd[1566]: died: caught child death; trying to detach
Oct 2 07:06:38 node1 lamd[1566]: died: detaching table entry 10
Oct 2 07:06:38 node1 lamd[1566]: died: finished
Oct 2 07:06:38 node1 kernel: Stack: df8d6bd0 00000206 df8d6bd0 00000246
c0133c33 df8d6bd0 dffc2c60 000001f0
Oct 2 07:06:38 node1 kernel: cc270000 cde0a060 c5406900 cde0a1a0
000027ec c42bc8e0 00000206 000001f0
Oct 2 07:06:38 node1 kernel: 0000d8b8 c01b768f 0000071c 000001f0
c576e6c0 00000000 cde0a1a0 c01d95d4
Oct 2 07:06:38 node1 kernel: Call Trace: [kmalloc+163/384]
[alloc_skb+239/448] [tcp_sendmsg+580/4656] [nf_hook_slow+266/384]
[do_page_fault+0/1307]
Oct 2 07:06:38 node1 kernel: Call Trace: [<c0133c33>] [<c01b768f>]
[<c01d95d4>] [<c01c29aa>] [<c0116680>]
Oct 2 07:06:38 node1 kernel: [error_code+52/64]
[__generic_copy_to_user+48/64] [memcpy_toiovec+57/96]
[skb_copy_datagram_iovec+77/592] [kfree+131/144] [inet_sendmsg+53/64]
Oct 2 07:06:38 node1 kernel: [<c0108da4>] [<c0200b50>] [<c01b9269>]
[<c01b98ed>] [<c0133e13>] [<c01f73b5>]
Oct 2 07:06:38 node1 kernel: [sock_sendmsg+108/144]
[sock_readv_writev+148/160] [sock_writev+59/80] [do_readv_writev+428/736]
[poll_freewait+68/80] [do_select+566/592]
Oct 2 07:06:38 node1 kernel: [<c01b414c>] [<c01b4464>] [<c01b44fb>]
[<c013e02c>] [<c014e134>] [<c014e4f6>]
Oct 2 07:06:38 node1 kernel: [sys_select+1138/1152] [do_fcntl+379/656]
[sys_writev+67/96] [system_call+51/64]
Oct 2 07:06:38 node1 kernel: [<c014e9b2>] [<c014cd8b>] [<c013e203>]
[<c0108c93>]
Oct 2 07:06:38 node1 kernel: Code: 89 42 04 89 10 c7 01 00 00 00 00 c7 41 04
00 00 00 00 8b 06
Oct 2 07:06:38 node1 lamd[1566]: kenyad: did not start this process with the
flatd; detaching now
Oct 2 07:06:38 node1 lamd[1566]: kenyad: in pdetachindex
Oct 2 07:06:38 node1 lamd[1566]: kenyad: detatching, checking for RTF_FLAT
(0x800) in flags: 0x159691
Oct 2 07:06:38 node1 lamd[1566]: kouter: kqdetach detached process pid=9933
Oct 2 07:06:38 node1 lamd[1566]: kouter: kqdetach calling kio_close
Oct 2 07:06:38 node1 lamd[1566]: kouter: kqdetach calling knuke
Oct 2 07:06:39 node1 lamd[1566]: kouter: kqdetach detached process pid=9932
Oct 2 07:06:39 node1 lamd[1566]: kouter: kqdetach calling kio_close
Oct 2 07:06:39 node1 lamd[1566]: kouter: kqdetach calling knuke
Oct 2 07:10:00 node1 crond[938]: (dikov) CMD
(/home/dikov/bin/md_parallel_p9v)
Oct 2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1041, pri=1095
Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - successfully created file
/tmp/lam-dikov на node1.entry.kiev.ua/lam-flatd2
Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - file descriptor 13
Oct 2 07:10:10 node1 lamd[1566]: kenyad: pqcreating with rtf 0x159290
Oct 2 07:10:10 node1 lamd[1566]: kenyad: looking for executable
"/usr/local/bin/mdrun_mpi" in directory "/home/gromacs/dikov/P9V"
Oct 2 07:10:10 node1 lamd[1566]: kenyad: found "/usr/local/bin/mdrun_mpi"
Oct 2 07:10:10 node1 lamd[1566]: kenyad: creating new user process...
Oct 2 07:10:10 node1 lamd[1566]: kenyad: attempting to receive stdout/stderr
file descriptors
Oct 2 07:10:10 node1 lamd[1566]: kenyad: recv_stdio_fds: happiness
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting environment variables to
pass to new process
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSFD
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSRTF
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMJOBID
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMKENYAPID
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMWORLD
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMPARENT
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMRANK
Oct 2 07:10:10 node1 lamd[1566]: kenyad: checking for working directory flag
Oct 2 07:10:10 node1 lamd[1566]: kenyad: working directory set explicitly
Oct 2 07:10:10 node1 lamd[1566]: kenyad: running in directory
/home/gromacs/dikov/P9V
Oct 2 07:10:10 node1 lamd[1566]: kenyad: fork/exec succeeded, pid 1042, index
10, rtf 0x159292
Oct 2 07:10:10 node1 lamd[1566]: kenyad: create succeeded, process running
Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - successfully created file
/tmp/lam-dikov на node1.entry.kiev.ua/lam-flatd3
Oct 2 07:10:10 node1 lamd[1566]: flatd: flqload - file descriptor 13
Oct 2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1042, pri=0
Oct 2 07:10:10 node1 lamd[1566]: kenyad: pqcreating with rtf 0x159290
Oct 2 07:10:10 node1 lamd[1566]: kenyad: looking for executable
"/usr/local/bin/mdrun_mpi" in directory "/home/gromacs/dikov/P9V"
Oct 2 07:10:10 node1 lamd[1566]: kenyad: found "/usr/local/bin/mdrun_mpi"
Oct 2 07:10:10 node1 lamd[1566]: kenyad: creating new user process...
Oct 2 07:10:10 node1 lamd[1566]: kenyad: attempting to receive stdout/stderr
file descriptors
Oct 2 07:10:10 node1 lamd[1566]: kenyad: recv_stdio_fds: happiness
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting environment variables to
pass to new process
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSFD
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting TROLLIUSRTF
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMJOBID
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMKENYAPID
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMWORLD
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMPARENT
Oct 2 07:10:10 node1 lamd[1566]: kenyad: setting LAMRANK
Oct 2 07:10:10 node1 lamd[1566]: kenyad: checking for working directory flag
Oct 2 07:10:10 node1 lamd[1566]: kenyad: working directory set explicitly
Oct 2 07:10:10 node1 lamd[1566]: kenyad: running in directory
/home/gromacs/dikov/P9V
Oct 2 07:10:10 node1 lamd[1566]: kenyad: fork/exec succeeded, pid 1043, index
11, rtf 0x159292
Oct 2 07:10:10 node1 lamd[1566]: kenyad: create succeeded, process running
Oct 2 07:10:10 node1 lamd[1566]: kouter: attached process pid=1043, pri=0
Oct 2 07:10:14 node1 pam_tcb[1044]: sshd: Authentication failed for dikov
from (uid=0)
Oct 2 07:10:14 node1 sshd[1049]: input_userauth_request: illegal user
sysadmin
Oct 2 07:10:14 node1 sshd[1049]: Failed none for UNKNOWN USER from 10.1.1.1
port 49371 ssh2
Oct 2 07:10:14 node1 sshd[1049]: Failed password for UNKNOWN USER from
10.1.1.1 port 49371 ssh2
Oct 2 07:10:14 node1 sshd[1049]: Connection closed by 10.1.1.1
Oct 2 07:10:14 node1 pam_tcb[1051]: sshd: Authentication failed for dikov
from (uid=0)
Oct 2 07:10:14 node1 sshd[1056]: input_userauth_request: illegal user
sysadmin
Oct 2 07:10:14 node1 sshd[1056]: Failed none for UNKNOWN USER from 10.1.1.1
port 50883 ssh2
Oct 2 07:10:14 node1 sshd[1056]: Failed password for UNKNOWN USER from
10.1.1.1 port 50883 ssh2
Oct 2 07:10:14 node1 sshd[1056]: Connection closed by 10.1.1.1
Oct 2 07:10:14 node1 pam_tcb[1057]: sshd: Authentication failed for dikov
from (uid=0)
Oct 2 07:10:14 node1 pam_tcb[1061]: sshd: Authentication failed for root from
(uid=0)
Oct 2 07:10:14 node1 pam_tcb[1065]: sshd: Authentication failed for dikov
from (uid=0)
Oct 2 07:10:14 node1 pam_tcb[1069]: sshd: Authentication failed for root from
(uid=0)
Oct 2 07:10:14 node1 sshd[1075]: input_userauth_request: illegal user
sysadmin
Oct 2 07:10:14 node1 sshd[1075]: Failed none for UNKNOWN USER from 10.1.1.1
port 57066 ssh2
Oct 2 07:10:14 node1 sshd[1075]: Failed password for UNKNOWN USER from
10.1.1.1 port 57066 ssh2
Oct 2 07:10:14 node1 last message repeated 2 times
Насколько я понимаю это проблемы в самом ядре, я прав?
Дима
--
Sincerely yours,
Ph.D. Student Dmytro Kovalskyy
Institute of Molecular Biology & Genetics
150 Akad. Zabolotnogo Street,
Kiev-143, 03143
UKRAINE
E-mail: dikov на imbg.org.ua
Fax: +380 (44) 266-0759
Tel.: +380 (44) 266-5589
Подробная информация о списке рассылки community