Log in

Bekijk de volledige versie : wan port dies - or not???



piri
10-12-2004, 15:42
hey!

I'm not sure if it's the famous "wan dies" issue but I have the following problem:

- Wl-500g running in Homegateway mode connected to Cable Modem with fixed IP and fixed DNS
- Running the busybox_httpd via firewall rule 8080 -> 80 from outside (in post-boot and post-firewall scripts)
- Running samba (in post-boot script)
- Running on Auto speed (don't know what speed my cable modem is running on)
- upgraded from 1.8.1.7-2a to 1.8.1.7-3 (must admit that the first upgrade failed so I did it a 2nd time and it succeeded then)

After that upgrade my WAN connection doesn't come up at boot time anymore, only pressing the "disconnect/connect" buttons (although it says "conneced" in the link status field) in the webconsole subsequently, connection is being established. After a short time of web browsing (no huge data transfers) WAN connection is lost again, bringing it back up only with the buttons in the web console.

Also tried

et -i eth1 down/up (thought it's similar to the buttons...)
as well as

et -i eth1 speed 10half/10full/100half/100full
but with no effect, only buttons work.

Tried also the following with no effect:
- installing 1.8.1.7-3 in recovery mode
- resetting to factory defaults via web console
- downgrading to 1.8.1.7-2a
- installing 1.8.2.4
- setting number of max connections to 4096

Also typing


et -i eth1 dump

only brings up a "?" or an empty line, I don't know what that means...

The only thing I noticed is that all trouble begins when flashfs comes into work, i.e. after upgrading I set up the scripts and use flashfs (save, commit, enable, reboot) and after that WAN connection is lost.
Then the post-firewall script is not executed anymore and samba does not work (many smbd processes open up, but isn't this a different issue?...I also have two unknown "ping" processes, but I think I'm getting paranoid :))

Sadly


flashfs disable

and rebooting doesn't do the trick either....

Could it be that I messed up flashfs upon the failed upgrade? Can I recover it in any ways or do a total cleanup? Is there a way to monitor when the WAN dies?

Meanwhile I don't know where to look at anymore and what to try next, all I tried didn't bring my any further.... :confused:

cheers, thomas.

piri
10-12-2004, 16:12
I just found out that it also must have something to do with DNS resolution:

- After rebooting only websites that I surfed before reboot seem to work (must be somewhere in a cache), all others don't.
- But pinging them in the wl500g telnet console works i.e WAN connection is indeed established, only the DNS lookup failed.
- Typing the correspoding IP address in the address field of the browser also worked...

... but: after a certain time the WAN finally died, i.e. even IP addresses were not pingable via telnet anymore

Now I'm totally confused.... :confused:
cheers, thomas.

Oleg
10-12-2004, 16:38
Well, flashfs itself does not cause any problems, your scripts could cause problems.
type "flashfs disable" and reboot your router. Check status then.
If it still does not work, post system log and ps ax output here.

Styno
10-12-2004, 17:50
I praise your effords to find a solution for your problem yourself.

I don't think you have the 'WAN dies' problem, certainly if you can still ping an IP address (whithout resolving) while you have the problem. You might try to setup a different DNS server for your WL-500g, or try the set it on your Windows box.

piri
10-12-2004, 18:08
Thanks for your quick replies guys,

In the meantime I tried the following:

- Resetting the router to defaults, rebooting
- Making a "clean" install of 1.8.1.7-3 via recovery tool
- Resetting and rebooting
- Editing my settings in web console
- Rebooting

Didn't touch the flashfs, still not working, only could get it to life with connect/disconnect buttons...ping isnt working either....

Here's the dmesg output after a reboot:



Dec 31 13:00:01 kernel: klogd started: BusyBox v1.00 (2004.11.14-18:33+0000)
Dec 31 13:00:01 kernel: CPU revision is: 00024000
Dec 31 13:00:01 kernel: Loading BCM4710 MMU routines.
Dec 31 13:00:01 kernel: Primary instruction cache 8kb, linesize 16 bytes (2 ways)
Dec 31 13:00:01 kernel: Primary data cache 4kb, linesize 16 bytes (2 ways)
Dec 31 13:00:01 kernel: Linux version 2.4.20 (root@omnibook) (gcc version 3.2.3 with Broadcom modifications) #3 Wed Nov 10 22:17:25 MSK 2004
Dec 31 13:00:01 kernel: Determined physical RAM map:
Dec 31 13:00:01 kernel: memory: 01000000 @ 00000000 (usable)
Dec 31 13:00:01 kernel: On node 0 totalpages: 4096
Dec 31 13:00:01 kernel: zone(0): 4096 pages.
Dec 31 13:00:01 kernel: zone(1): 0 pages.
Dec 31 13:00:01 kernel: zone(2): 0 pages.
Dec 31 13:00:01 kernel: Kernel command line: root=/dev/mtdblock2 noinitrd init=/linuxrc console=ttyS0,115200
Dec 31 13:00:01 kernel: CPU: BCM4710 rev 0 at 125 MHz
Dec 31 13:00:01 kernel: !unable to setup serial console!
Dec 31 13:00:01 kernel: Calibrating delay loop... 82.94 BogoMIPS
Dec 31 13:00:01 kernel: Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
Dec 31 13:00:01 kernel: Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Dec 31 13:00:01 kernel: Page-cache hash table entries: 4096 (order: 2, 16384 bytes)
Dec 31 13:00:01 kernel: Checking for 'wait' instruction... unavailable.
Dec 31 13:00:01 kernel: POSIX conformance testing by UNIFIX
Dec 31 13:00:01 kernel: PCI: Fixing up bus 0
Dec 31 13:00:01 kernel: PCI: Fixing up bridge
Dec 31 13:00:01 kernel: PCI: Fixing up bus 1
Dec 31 13:00:01 kernel: Initializing RT netlink socket
Dec 31 13:00:01 kernel: Starting kswapd
Dec 31 13:00:01 kernel: NTFS driver v1.1.22 [Flags: R/O]
Dec 31 13:00:01 kernel: pty: 256 Unix98 ptys configured
Dec 31 13:00:01 kernel: Amd/Fujitsu Extended Query Table v1.1 at 0x0040
Dec 31 13:00:01 kernel: Physically mapped flash: Swapping erase regions for broken CFI table.
Dec 31 13:00:01 kernel: number of CFI chips: 1
Dec 31 13:00:01 kernel: Flash device: 0x400000 at 0x1fc00000
Dec 31 13:00:01 kernel: Physically mapped flash: squashfs filesystem found at block 1101
Dec 31 13:00:01 kernel: Creating 5 MTD partitions on "Physically mapped flash":
Dec 31 13:00:01 kernel: 0x00000000-0x00040000 : "pmon"
Dec 31 13:00:01 kernel: 0x00040000-0x003e0000 : "linux"
Dec 31 13:00:01 kernel: 0x001137f4-0x003e0000 : "rootfs"
Dec 31 13:00:01 kernel: 0x003f0000-0x00400000 : "nvram"
Dec 31 13:00:01 kernel: 0x003e0000-0x003f0000 : "config"
Dec 31 13:00:01 kernel: sflash: chipcommon not found
Dec 31 13:00:01 kernel: ip_conntrack version 2.1 (128 buckets, 1024 max) - 344 bytes per conntrack
Dec 31 13:00:01 kernel: ip_conntrack_pptp version 1.9 loaded
Dec 31 13:00:01 kernel: ip_nat_pptp version 1.5 loaded
Dec 31 13:00:01 kernel: ip_tables: (C) 2000-2002 Netfilter core team
Dec 31 13:00:01 kernel: ipt_time loading
Dec 31 13:00:01 kernel: 802.1Q VLAN Support v1.7 Ben Greear <greearb@candelatech.com>
Dec 31 13:00:01 kernel: All bugs added by David S. Miller <davem@redhat.com>
Dec 31 13:00:01 kernel: FAT: bogus logical sector size 33024
Dec 31 13:00:01 kernel: FAT: bogus logical sector size 33024
Dec 31 13:00:01 kernel: NTFS: Unable to set blocksize 512.
Dec 31 13:00:01 kernel: VFS: Mounted root (squashfs filesystem) readonly.
Dec 31 13:00:01 kernel: Warning: unable to open an initial console.
Dec 31 13:00:01 kernel: Algorithmics/MIPS FPU Emulator v1.5
Dec 31 13:00:01 kernel: eth0: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 1.3.2.0
Dec 31 13:00:01 kernel: default: 1a60
Dec 31 13:00:01 kernel: eth1: Broadcom BCM47xx 10/100 Mbps Ethernet Controller 1.3.2.0
Dec 31 13:00:01 kernel: PCI: Enabling device 01:02.0 (0004 -> 0006)
Dec 31 13:00:01 kernel: eth2: Broadcom BCM4320 802.11 Wireless Controller 1.3.2.0
Dec 31 13:00:02 kernel: usb.c: USB device 2 (vend/prod 0x781/0x7104) is not claimed by any active driver.
Dec 31 13:00:04 kernel: Vendor: Generic Model: STORAGE DEVICE Rev: 1033
Dec 31 13:00:04 kernel: Type: Direct-Access ANSI SCSI revision: 02
Dec 31 13:00:04 kernel: Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
Dec 31 13:00:04 kernel: SCSI device sda: 512000 512-byte hdwr sectors (262 MB)
Dec 31 13:00:04 kernel: sda: Write Protect is off
Dec 31 13:00:05 kernel: default: 3a60
Dec 31 13:00:05 kernel: lp driver: get device ID
Dec 31 13:00:05 kernel: neg fail
Dec 31 13:00:07 kernel: Adpt 2 c000
Dec 31 13:00:08 kernel: Adpt 1 c010
Dec 10 06:54:46 kernel: Adpt 0 c010
Dec 10 06:54:47 kernel: Link up c010 8110
Dec 10 06:54:47 kernel: Adpt ffffffff 8110
Dec 10 06:54:48 kernel: lp driver: get device ID
Dec 10 06:54:48 kernel: neg fail
Dec 10 17:54:48 kernel: neg fail
Dec 10 17:54:52 ntp client: time is synchronized to time.nist.gov
Dec 10 17:54:52 kernel: VFS: Can't find ext3 filesystem on dev sd(8,1).
Dec 10 17:54:52 kernel: MSDOS FS: Using codepage 950
Dec 10 17:54:52 kernel: MSDOS FS: IO charset cp950
Dec 10 17:54:53 USB storage: vfat fs mounted to /tmp/harddisk
Dec 10 06:54:53 kernel: ext3: No journal on filesystem on sd(8,2)
Dec 10 06:54:53 kernel: MSDOS FS: Using codepage 950
Dec 10 06:54:53 kernel: MSDOS FS: IO charset cp950
Dec 10 06:54:53 kernel: FAT: bogus logical sector size 0
Dec 10 06:54:53 kernel: VFS: Can't find a valid FAT filesystem on dev 08:02.
Dec 10 06:54:53 kernel: FAT: freeing iocharset=cp950
Dec 10 06:54:53 kernel: FAT: bogus logical sector size 0
Dec 10 06:54:53 kernel: VFS: Can't find a valid FAT filesystem on dev 08:02.


And here the ps ax output


PID Uid VmSize Stat Command
1 piri 576 S /sbin/init
2 piri SW [keventd]
3 piri RWN [ksoftirqd_CPU0]
4 piri SW [kswapd]
5 piri SW [bdflush]
6 piri SW [kupdated]
8 piri SW [mtdblockd]
37 piri 296 S telnetd
43 piri 376 S httpd eth1
44 piri 488 S nas /tmp/nas.lan.conf /tmp/nas.lan.pid lan
46 piri 356 S syslogd -m 0 -O /tmp/syslog.log -S -l 6
49 piri 312 S klogd
50 nobody 472 S dnsmasq
51 piri SW [khubd]
61 piri 276 S lpd
65 piri 248 S p9100d -f /dev/usb/lp0 0
66 piri 248 S p9101d -f /dev/printers/0 1
70 piri SW [usb-storage-0]
71 piri SW [scsi_eh_0]
79 piri 316 S infosvr br0
80 piri 460 S watchdog
82 piri 336 S ntp
86 piri 504 S -sh
91 piri 400 S stupid-ftpd
96 piri 364 R ps ax

@Styno, the DNS Servers are those provided by my Cable ISP, I also have them entered in the IP Config Settings of the router....when connecting WAN directly to my Powerbook with those settings everything works fine....

thanks a lot &
cheers, thomas

Antiloop
10-12-2004, 18:50
how do you configure your router?

by putting back your saved configuration file?

Oleg
10-12-2004, 19:32
Looks like you've static IP address for connection?
wl500g is working fine, at least it's able to synchronize clocks


Dec 10 17:54:52 ntp client: time is synchronized to time.nist.gov


Do you've any firewall on your PC?

brubber
10-12-2004, 22:49
Supposing your computer is not running DHCP or DNS server try this:

On your modem:

Enable DHCP
Enable DNS server

On your router:
WAN IP: Get automically
Get DNS server automatically:Yes
If you have a Speedtouch modem the DNS server will probably be 10.0.0.138 if you need to set it manually

LAN DHCP: Enable
DNS and WINS server: leave blank or use 192.168.1.1 for DNS and leave other fields blank

WLAN DHCP disable Wireless firewall if you have it in your firmware.

On your workstation: Get IP automatically, preferred DNS servers either leave blank or enter 192.168.1.1, renew IP adress

If this doesn't work check you firewall config.

piri
11-12-2004, 08:45
First of all thanks a lot guys for all your help!!!


how do you configure your router?

by putting back your saved configuration file?

No, to be on the safe side I entered all settings manually...


Do you've any firewall on your PC?
Only the one from WL-500g, which is enabled by default...


On your modem:

Enable DHCP
Enable DNS server


I have a Cable modem with a static IP which only can be remotely configured by my ISP (chello), who also provides me with DNS servers, which I enter in the WL-500g IP Config along with my static IP address and gateway.

I noticed that a short time after rebooting everything works fine (Maybe when a certain module during bootup hasnt been loaded yet...), then this DNS problem occurs and a while after (say ca. 10 mins) WAN connection finally dies...

cheers, thomas

piri
11-12-2004, 08:58
Now I got it: DNS requests from WAN to LAN/WLAN are not processed anymore, i.e. www.chupa.nl is pingable on the router, but not on the client machines (hostname not found).
I tried to restart dnsmasq (because it came closest to those processes I thought they were responsible for DNS resolving, though I really don't know...), but with no effect. Only disconnect/reconnect buttons help.

When I enter one WAN DNS server in the 1st field of the DNS Servers in the DHCP Section of IP Config in the Web Console, everything works fine, except the name of WL-500g (i.e. my.router) is not resolved anymore.


I also have two unknown "ping" processes, but I think I'm getting paranoid


Also my router has now three open ping processes to 140.113.1.1 (ns1.NCTU.edu.tw, National Chiao Tung University), could it be that I have been hacked?????

I'm really disappointed :(
cheers, thomas.

piri
12-12-2004, 09:57
Oleg, can you please explain what services are responsible for DNS and DHCP for LAN/WLAN that I can take a closer look into.

It's still so that after a router reboot a WAN DNS lookup is not possible on the LAN/WLAN clients through the DNS server 192.168.1.1. Only after a disconntect/connect sequence via web console (I wonder what those buttons do, I haven't found a working command line equivalent so far :)) it works fine...

Also after pressing those buttons that magic ping processes to 140.113.1.1 come up (I really wonder if thats something to do with above issue), I wish I could find the config file where this IP address is located :)

thanks a lot for your help so far,
cheers, thomas.

Oleg
13-12-2004, 18:06
Please type


cat /etc/dnsmasq.conf

and post results here.
Both DHCP and DNS stuff are handled by dnsmasq.

piri
14-12-2004, 18:00
thanks oleg, i suspected that....

here is my dnsmasq.conf:


user=nobody
interface=br0
dhcp-leasefile=/tmp/dnsmasq.log
dhcp-range=lan,192.168.1.2,192.168.1.254,86400
read-ethers
dhcp-authoritative


also took a glimpse into /tmp/dnsmasq.log, where both my external nameservers reside, so no anomalies to the dnsmasq specification, which i found @ www.thekelleys.co.uk

i wonder why dnsmasq is unable to route LAN dns requests from 192.168.1.1 to my external nameservers after a reboot and why the connect/disconnect buttons eliminate that problem...

thanks again for your patience,
cheers, thomas.

Oleg
14-12-2004, 19:04
Also, I need this:


cat /etc/resolv.conf
cat /etc/ethers
cat /etc/hosts
iptables -L -v -n


You may try running nslookup on the router:



nslookup www.chupa.nl 127.0.0.1

piri
15-12-2004, 10:54
i think i found the reason....

found out that only when I disable the Starcraft(Battle.Net) option in the DMZ properties page of the web console, the problem of not resolving DNS entries between WAN and LAN occurs.

for my understanding and by comparing the iptables output for the option enabled and disabled, all an enabled battle.net setting does is adding the following entries to the iptables chains:



FORWARD
0 0 ACCEPT udp -- * * 0.0.0.0/0 0.0.0.0/0 udp dpt:6112

PREROUTING
0 0 NETMAP udp -- * * 0.0.0.0/0 84.113.4.16 udp spt:6112 192.168.1.0/24

POSTROUTING
0 0 NETMAP udp -- * * 192.168.1.0/24 0.0.0.0/0 udp dpt:6112 84.113.4.16/32


All other chains keep unchanged.
For my setup that means that not having these entries in the iptables chains all external DNS requests between WAN and LAN are not resolved....very strange....:confused:

The files your requested show no anomalies to me though...

/etc/resolv.conf contains the two entries of my nameservers in the form

nameserver <dns ip address1>
nameserver <dns ip address2>


/etc/group contains my statically mapped mac to ip addresses in the form


<macaddr1> 192.168.1.2
<macaddr2> 192.168.1.3
<macaddr3> 192.168.1.4
<macaddr4> 192.168.1.5


/etc/hosts says

127.0.0.1 localhost.localdomain localhost
192.168.1.1 batarillofred my.router my.WL500g



Do you have any explaination for this behaviour?
cheers, thomas.

Oleg
15-12-2004, 11:14
Yes, someone has already reported this problem with starcraft - but this never happened to me - probably due to a different way for internet access...
Could you please try to


rmmod ip_nat_starcraft
rmmod ipt_NETMAP

when you've it disabled?

piri
15-12-2004, 11:42
disabled starcraft and issued both commands, but no change...

but lsmod doesnt show these modules anyway....

when enabled though both modules show up, the ip_nat_starcraft module in unused state:


usb-storage 62144 1
sd_mod 13268 2
scsi_mod 70176 2 [usb-storage sd_mod]
videodev 8304 0
printer 11900 0 (unused)
lp 8628 0
parport_splink 2956 1
parport 25664 1 [lp parport_splink]
usb-ohci 21764 0 (unused)
usbcore 77224 1 [usb-storage printer usb-ohci]
ipt_NETMAP 992 2
ip_nat_starcraft 2224 0 (unused)
wl 431504 0 (unused)
et 23224 2


removed ip_nat_starcraft in enabled state, but no effect, were not able to remove ipt_NETMAP....

hope that helps
cheers, thomas.

Oleg
15-12-2004, 11:48
noop.
please try to figure out the problem with iptables... Check the input chain and output chains.
you could enable everything in the input by


iptables -I INPUT -j ACCEPT

piri
15-12-2004, 12:10
noop.
please try to figure out the problem with iptables... Check the input chain and output chains.
you could enable everything in the input by


iptables -I INPUT -j ACCEPT


no effect with above command...
also iptables output for INPUT and OUTPUT chains does not differ for both options (starcraft enabled/disabled)

the answer might be found in those suspect "connect/disconnect" buttons which solve the problem when starcraft is disabled...

cheers, thomas

Oleg
15-12-2004, 12:12
have you tried
nslookup www.chupa.nl 127.0.0.1
om the wl500g?

piri
15-12-2004, 12:34
on the router

$ nslookup www.chupa.nl 127.0.0.1
Server: localhost.localdomain
Address: 127.0.0.1

Name: www.chupa.nl
Address: 217.67.235.46


on the client (i.e. 192.168.1.2)


$ nslookup www.chupa.nl 192.168.1.1
Server: 192.168.1.1
Address: 192.168.1.1#53

** server can't find www.chupa.nl: REFUSED

brubber
15-12-2004, 20:26
No offense, but have you tried: :confused: :confused:

LAN DHCP: Enable
DNS and WINS server: leave blank or use 192.168.1.1 for DNS and leave other fields blank

WLAN DHCP disable Wireless firewall if you have it in your firmware.

On your workstation: Get IP automatically, preferred DNS servers either leave blank or enter 192.168.1.1, renew IP adress

:confused: :confused:

Oleg
15-12-2004, 20:56
please type


killall -1 dnsmasq

this should bring it back to live

piri
16-12-2004, 15:57
No offense, but have you tried: :confused: :confused:

LAN DHCP: Enable
DNS and WINS server: leave blank or use 192.168.1.1 for DNS and leave other fields blank

WLAN DHCP disable Wireless firewall if you have it in your firmware.

On your workstation: Get IP automatically, preferred DNS servers either leave blank or enter 192.168.1.1, renew IP adress

:confused: :confused:

yep thanks, have tried this as well, unfortunately with no effect....


please type


killall -1 dnsmasq

this should bring it back to live

noop, didnt...

I don't know what it is, but as long as I've got starcraft enabled everything is working as should (except from those nasty wan dies problems that occur every now and then...)

thanks guys for your support, this forum really kicks a**