udhcpc taking 100% CPU in 1.9.2.7-9

**sesamebike** · 31-03-2008, 15:46

Hi,

After upgrading to 1.9.2.7-9 I started to have problem with the udhcpc process taking all CPU. It happends after a about a day or two. Sofar I've restarted the router when it happends after which the udhcpc process behaves correctly for another day or two until it goes havoc again and drains the CPU.

Has anyone else had similar problems with udhcppc in 1.9.2.7-9?

I didn't have this problem with 1.9.2.7-8. I'm running the wl500g which unfortunately isn't supported in the latest 1.9.2.7-10 release..

**wiz** · 31-03-2008, 16:01

Originally Posted by sesamebike

Hi,

Has anyone else had similar problems with udhcppc in 1.9.2.7-9?

I have seen udhcpd not renewing my ip address, but frankly I did not look at if it would have hogged the cpu or not. All I saw was that I was offline every four ours (lease time is 14400) so I am back at 1.9.2.7-8. A manual renew did the trick though, but to manually renew it is kinda silly.

I also have a wl500g.

**Oleg** · 31-03-2008, 18:18

Well, ok.

wiz, could you please check 1.9.2.7-9 once again? Use top to check CPU usage.

I could prepare test version for the original wl500g with all udhcpc changes rolled back and then add changes one-by-one. Are you ready to perform such tests?

**Oleg** · 31-03-2008, 18:37

BTW, could you please type this:

nvram show|grep ^wanx_

and post results?

Also, I need a process list, type

/bin/ps w

**sesamebike** · 31-03-2008, 19:05

Hi Oleg,

I'd be happy to help debug this. Note however that I takes about a day before the problem manifest itself for me.

In the meantime, while waiting for it to happen again, here's the info you requested:

Code:

[admin@wl500g root]$ nvram show|grep ^wanx_
wanx_gateway=80.252.185.1
size: 10613 bytes (22155 left)

Code:

[admin@wl500g root]$ /bin/ps w
  PID  Uid     VmSize Stat Command
    1 admin       632 S   /sbin/init
    2 admin           SW  [keventd]
    3 admin           RWN [ksoftirqd_CPU0]
    4 admin           SW  [kswapd]
    5 admin           SW  [bdflush]
    6 admin           SW  [kupdated]
    7 admin           SW  [mtdblockd]
   45 admin       380 S   httpd eth1
   46 admin       552 S   nas /tmp/nas.lan.conf /tmp/nas.lan.pid lan
   50 admin       408 S   syslogd -m 0 -O /tmp/syslog.log -S -l 7
   51 admin       360 S   klogd
   53 nobody      484 S   [dnsmasq]
   54 admin           SW  [khubd]
   70 admin           SW  [usb-storage-0]
   71 admin           SW  [scsi_eh_0]
  109 admin       468 S   watchdog
  112 admin       344 S   ntp
  128 admin       400 S   dropbear -s
  129 admin       436 S   crond
  143 admin           SW  [usb-storage-1]
  144 admin           SW  [scsi_eh_1]
  168 admin       436 S   udhcpc -i eth1 -p /var/run/udhcpc0.pid -s /tmp/udhcpc
  174 admin           SW  [kjournald]
  175 admin           SW  [kjournald]
  203 admin       412 S   /opt/sbin/httpd -c /opt/etc/httpd.conf -p 8080 -h /opt/web
  270 admin       692 R   dropbear -s
  271 admin       548 S   -sh
  290 admin       392 R   /bin/ps w

I double checked my iptables settings and realised that this might be the culprit:

Code:

Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 logdrop    all  --  *      *       0.0.0.0/0            0.0.0.0/0          state INVALID
 3416  399K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0          state RELATED,ESTABLISHED
  240 14400 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0          state NEW
  170 13877 ACCEPT     all  --  br0    *       0.0.0.0/0            0.0.0.0/0          state NEW
   68 23238 SECURITY   all  --  eth1   *       0.0.0.0/0            0.0.0.0/0          state NEW
    0     0 DROP       udp  --  eth1   *       0.0.0.0/0            255.255.255.255    udp spt:67 dpt:68
   37 13712 DROP       udp  --  eth1   *       0.0.0.0/0            255.255.255.255    udp spt:68 dpt:67

I'll remove the UDP rules for port 67/68 in order to eliminate it as a probable source of the problem.

**wiz** · 31-03-2008, 19:22

Originally Posted by Oleg

Well, ok.

wiz, could you please check 1.9.2.7-9 once again? Use top to check CPU usage.

I could prepare test version for the original wl500g with all udhcpc changes rolled back and then add changes one-by-one. Are you ready to perform such tests?

Sure, but give me time for that as I currently will be using the internet connection to work from home. Nearer the weekend I will have opportunity to have a closer look at this to see if we can trace this odd behaviour.

And off course I am willing to test and put different stuff on my wl500 to see if we can get this working stable.

cheers,

Wiz

**Oleg** · 31-03-2008, 19:47

Originally Posted by sesamebike

Code:

    0     0 DROP       udp  --  eth1   *       0.0.0.0/0            255.255.255.255    udp spt:67 dpt:68
   37 13712 DROP       udp  --  eth1   *       0.0.0.0/0            255.255.255.255    udp spt:68 dpt:67

I'll remove the UDP rules for port 67/68 in order to eliminate it as a probable source of the problem.

Hm. Do you've something in the post-firewall? Could you please post

cat /tmp/filter_rules ?

**sesamebike** · 01-04-2008, 12:57

Hi Oleg,

Yes I did put the following lines in my post-firewall

Code:

iptables -I INPUT 8 -i "$1" -d 255.255.255.255 -p udp --sport 68 --dport 67 -j DROP
iptables -I INPUT 8 -i "$1" -d 255.255.255.255 -p udp --sport 67 --dport 68 -j DROP

I remember why I did this. The first line is filtering out broadcasts from other dhcp clients on the network. The other line is actually pointless as there's no broadcasting to port 68 in the dhcp protocol. Anyway I put these lines in there deliberately since I wanted to remove them from the default logdrop rules which were flooding the syslog. (I send my iptables logs to dshield, http://isc.sans.org/howto.html every hour)

Anyway I'ts now happened again:

Code:

Mem: 13304K used, 660K free, 0K shrd, 1240K buff, 5364K cached
CPU:  97% usr   2% sys   0% nice   0% idle   0% io   0% irq   0% softirq
Load average: 1.65 1.53 1.25
  PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
  166     1 admin    R     2712  19%  98% udhcpc -i eth1 -p /var/run/udhcpc0.pi
 2253  2238 admin    R     1792  13%   2% top
   50     1 admin    S     2700  19%   0% syslogd -m 0 -O /tmp/syslog.log -S -l
 2237   131 admin    S     1824  13%   0% dropbear -s
 2238  2237 admin    S     2736  20%   0% -sh
  132     1 admin    S     2716  19%   0% crond
   51     1 admin    S     2696  19%   0% klogd
  110     1 admin    S     2380  17%   0% watchdog
    1     0 admin    S     2380  17%   0% /sbin/init
  112   110 admin    S     2372  17%   0% ntp
   45     1 admin    S     1932  14%   0% httpd eth1
   46     1 admin    S     1900  14%   0% nas /tmp/nas.lan.conf /tmp/nas.lan.pi
  197     1 admin    S     1788  13%   0% /opt/sbin/httpd -c /opt/etc/httpd.con
  131     1 admin    S     1752  13%   0% dropbear -s
   53     1 nobody   S      960   7%   0% [dnsmasq]
   70     1 admin    SW       0   0%   0% [usb-storage-0]
    3     1 admin    RWN      0   0%   0% [ksoftirqd_CPU0]
    7     1 admin    SW       0   0%   0% [mtdblockd]
  171     1 admin    SW       0   0%   0% [kjournald]
    4     1 admin    SW       0   0%   0% [kswapd]

There's nothing unusal in the /tmp/syslog.log. I'll keep the system running like this for a while in case you need me to extract any additional data.

**sesamebike** · 01-04-2008, 14:10

Sorry, I needed to free CPU resources and killed the udhcpc process. When starting it manually I get:

Code:

[admin@wl500g /]$ udhcpc -i eth1 -p /var/run/udhcpc0.pid -s /tmp/udhcpc
udhcpc (v0.9.9-pre) started
eth1: No such process
Sending discover...
Sending select for 80.252.185.xxx...
Lease of 80.252.185.xxx obtained, lease time 43200

(full ipaddress censored out by me)

Is the output "eth1: No such process" anything to worry about?

**sesamebike** · 01-04-2008, 18:06

Hi again,

Altough it did compain about eth1 it seems to be working ok. I can force a renewal by sending it SIGUSR1 as described in the source.

Code:

[admin@wl500g root]$ kill -SIGUSR1 `pidof udhcpc`

Code:

[admin@wl500g root]$ tail /tmp/syslog.log
Apr  1 18:38:21 udhcpc[2588]: Performing a DHCP renew
Apr  1 18:38:21 udhcpc[2588]: Lease of 80.252.185.xxx obtained, lease time 43200
Apr  1 18:38:21 dnsmasq[53]: read /etc/hosts - 11 addresses
Apr  1 18:38:21 dnsmasq[53]: read /etc/ethers - 6 addresses
Apr  1 18:38:21 dnsmasq[53]: reading /tmp/resolv.conf
Apr  1 18:38:21 dnsmasq[53]: using nameserver 80.252.160.164#53
Apr  1 18:38:21 dnsmasq[53]: using nameserver 80.252.160.162#53
Apr  1 18:38:22 iptables: Running post-firewall script
Apr  1 18:38:24 dhcp client: bound IP : 80.252.185.xxx from 80.252.185.1

Note that there seems to be a bug here somewhere, why is the firewall reconfigured when the DHCP client renewed the IP-adress? Intuitively it only needs to done when udhcpc issues deconfig, bound or nak. I would prefer if the firewall wasn't reconfigured as a result of udhcpc renew but that's maybe more of an opinion about function than an actual bug.

Bug or not, I haven't found a way to trigger the problem other than waiting for it to happen.

**sesamebike** · 02-04-2008, 10:13

Small update:

I've found that if the process is in the havoc mode and runs with 100% cpu, then sending a SIGUSR1 to it thus forcing it to renew the address, actually solves it.

I have now a workaround with a cronjob that checks regulary if the udhcpc process is consuming all resources, if so it send SIGUSR1 to it.

**Oleg** · 02-04-2008, 10:22

I need

cat /tmp/filter_rules

output.

There should be dhcp related rule which is missing in the iptables output. Also, disable your post-firewall completely for testing.

**sesamebike** · 02-04-2008, 13:26

Hi Oleg,

Thanks again for your time.

Code:

[admin@wl500g root]$ cat /tmp/filter_rules
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:MACS - [0:0]
:SECURITY - [0:0]
:logaccept - [0:0]
:logdrop - [0:0]
-A SECURITY -p tcp --syn -m limit --limit 1/s -j RETURN
-A SECURITY -p tcp --tcp-flags SYN,ACK,FIN,RST RST -m limit --limit 1/s -j RETUR                                                                                                                                                              N
-A SECURITY -p udp -m limit --limit 5/s -j RETURN
-A SECURITY -p icmp -m limit --limit 5/s -j RETURN
-A SECURITY -j logdrop
-A INPUT -m state --state INVALID -j logdrop
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -i lo -m state --state NEW -j ACCEPT
-A INPUT -i br0 -m state --state NEW -j ACCEPT
-A INPUT -i eth1 -m state --state NEW -j SECURITY
-A INPUT -p udp --sport 67 --dport 68 -j ACCEPT
-A INPUT -j logdrop
-A FORWARD -i br0 -o br0 -j ACCEPT
-A FORWARD -m state --state INVALID -j logdrop
-A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT
-A FORWARD -o eth1 ! -i br0 -j logdrop
-A FORWARD ! -i br0 -m state --state NEW -j SECURITY
-A FORWARD -m conntrack --ctstate DNAT -j ACCEPT
-A FORWARD -o br0 -j logdrop
-A logaccept -m state --state NEW -j LOG --log-prefix "ACCEPT " --log-tcp-sequen                                                                                                                                                              ce --log-tcp-options --log-ip-options
-A logaccept -j ACCEPT
-A logdrop -m state --state NEW -j LOG --log-prefix "DROP " --log-tcp-sequence -                                                                                                                                                              -log-tcp-options --log-ip-options
-A logdrop -j DROP
COMMIT

Code:

[admin@wl500g root]$ cat /usr/local/sbin/post-firewall
#!/bin/sh
logger -t iptables -p user.notice "Running post-firewall script"
iptables -I INPUT 6 -i "$1" -s xxxxxx-d "$2" -p tcp --syn --dport 22 -j logaccept
iptables -R INPUT 7 -i "$1" -d "$2" -p tcp --syn --dport 21 -j logaccept
iptables -t nat -A PREROUTING -i "$1" -p tcp --dport 8080 -j DROP
iptables -t nat -A VSERVER -p tcp --dport 80 -j REDIRECT --to-ports 8080
iptables -I INPUT 8 -i "$1" -d "$2" -p tcp --syn --dport 8080 -j ACCEPT
iptables -I INPUT 8 -i "$1" -d 255.255.255.255 -p udp --sport 68 --dport 67 -j DROP
iptables -I INPUT 8 -i "$1" -d 255.255.255.255 -p udp --sport 67 --dport 68 -j DROP
iptables -I INPUT 6 -i "$1" -s yyyyyy -d "$2" -p tcp --syn --dport 22 -j logaccept
iptables -I SECURITY -i "$1" -d "$2" -p tcp --syn --dport 8080 -j RETURN
if [ -f /etc/blacklist ];  then
  awk '{ print "iptables -I SECURITY -s " $1 " -j logdrop" }' /etc/blacklist | /bin/sh
fi
iptables -P INPUT DROP

(some adresses censored by me)

The /etc/blacklist blocks out certain ip ranges which has has been known for trying to doing bruteforce attacks against my machine:

Code:

[admin@wl500g root]$ cat /etc/blacklist
60.0.0.0/8
61.0.0.0/8
66.150.0.0/15
81.29.229.1
85.255.112.0/14
133.0.0.0/8
155.158.0.0/16
193.86.0.0/16
202.0.0.0/8
203.0.0.0/8
204.16.208.0/22
218.0.0.0/8
219.0.0.0/8
220.0.0.0/8
221.0.0.0/8
222.0.0.0/8

Yes, you're line seems to be there in /tmp/filter_rules. However I've accidently overwrite it in the post-firewall script. My mistake.

However, doesn't the fact that a SIGUSR signal seems to kick the udhcpc process back to normal indicate that it's not a firewall problem?

Anyway, I'll update/disable my post-firewall too see if it solves the problem.

Hopefully this was all due to my bad post-firewall settings. I'll post an update to verify this later when it has run for a while.

**sesamebike** · 03-04-2008, 12:26

Nope, today the same thing happened.

Code:

Mem: 13536K used, 428K free, 0K shrd, 1372K buff, 6200K cached
CPU:  99% usr   0% sys   0% nice   0% idle   0% io   0% irq   0% softirq
Load average: 1.14 1.04 1.01
  PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND
 2588     1 admin    R     2712  19%  98% udhcpc -i eth1 -p /var/run/udhcpc0.pi
10007 10003 admin    R     1792  13%   1% top

Code:

[admin@wl500g root]$ iptables -L INPUT -v -n
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 logdrop    all  --  *      *       0.0.0.0/0            0.0.0.0/0          state INVALID
63948 7810K ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0          state RELATED,ESTABLISHED
 5216  313K ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0          state NEW
  256 21076 ACCEPT     all  --  br0    *       0.0.0.0/0            0.0.0.0/0          state NEW
21615 2044K SECURITY   all  --  eth1   *       0.0.0.0/0            0.0.0.0/0          state NEW
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0          udp spt:67 dpt:68
    0     0 logaccept  tcp  --  eth1   *       x.x.x.x              80.252.185.xxx     tcp dpt:22 flags:0x16/0x02
   74  4216 ACCEPT     tcp  --  eth1   *       0.0.0.0/0            80.252.185.xxx     tcp dpt:8080 flags:0x16/0x02
    1    48 logaccept  tcp  --  eth1   *       y.y.y.y              80.252.185.xxx     tcp dpt:22 flags:0x16/0x02
20287 1900K logdrop    all  --  *      *       0.0.0.0/0            0.0.0.0/0

**Oleg** · 03-04-2008, 13:06

Try this one: http://oleg.wl500g.info/wlhdd/WL500g-1.9.2.7-9.2.1.trx

Thread: udhcpc taking 100% CPU in 1.9.2.7-9

Thread Tools

Rate This Thread

Display

udhcpc taking 100% CPU in 1.9.2.7-9 [solved]

Similar Threads

Âëàäåëüöàì WL-520gc è WL-520gu

Firmware 1.9.2.7 CR9

init çàíèìàåò áîëüøå 95% CPU

1.9.2.7 sources?

Posting Permissions