I have the following problem that keeps me sleepless for the last few nights.
I inherited some servers from the guy who retired, and I noticed that one of the interfaces sometimes gets frozen on one of the servers, making people unable to connect to it.
Now, the details:
We have 3 servers- one is the Database server, which hosts only the DB, and its connected to the Access Server, which also has some IIS-based webserver, which again provides .aspx-based controls for the customers, who visit out 3rd server, on which we host the website. I think the following picture will describe it better:
Network graph
The problem is that the Access Server is configured with 2 default gateways (sic!), but after all, according to route print
, it uses the first ISP gateway:
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 bb.bbb.bbb.241 bb.bbb.bbb.243 10
0.0.0.0 0.0.0.0 aaa.aa.aaa.129 aaa.aa.aaa.130 10
bb.bbb.bbb.240 255.255.255.248 bb.bbb.bbb.243 bb.bbb.bbb.243 10
bb.bbb.bbb.243 255.255.255.255 127.0.0.1 127.0.0.1 10
bb.255.255.255 255.255.255.255 bb.bbb.bbb.243 bb.bbb.bbb.243 10
127.0.0.0 255.0.0.0 127.0.0.1 127.0.0.1 1
192.168.0.0 255.255.255.0 192.168.0.97 192.168.0.97 10
192.168.0.97 255.255.255.255 127.0.0.1 127.0.0.1 10
192.168.0.255 255.255.255.255 192.168.0.97 192.168.0.97 10
aaa.aa.aaa.128 255.255.255.248 aaa.aa.aaa.130 aaa.aa.aaa.130 10
aaa.aa.aaa.130 255.255.255.255 127.0.0.1 127.0.0.1 10
aaa.aa.aaa.255 255.255.255.255 aaa.aa.aaa.130 aaa.aa.aaa.130 10
224.0.0.0 240.0.0.0 bb.bbb.bbb.243 bb.bbb.bbb.243 10
224.0.0.0 240.0.0.0 192.168.0.97 192.168.0.97 10
224.0.0.0 240.0.0.0 aaa.aa.aaa.130 aaa.aa.aaa.130 10
255.255.255.255 255.255.255.255 bb.bbb.bbb.243 bb.bbb.bbb.243 1
255.255.255.255 255.255.255.255 192.168.0.97 192.168.0.97 1
255.255.255.255 255.255.255.255 aaa.aa.aaa.130 aaa.aa.aaa.130 1
Default Gateway: aaa.aa.aaa.129
Now here is the deal – how can I make it work “normally”, or at least as close to normal as it can (So I won’t have to RDPlogin and restart the .129 adapter everyday)?
At first, I thought of a simple watchdog script, which will ping the freezing NIC from the second source address, but it failed miserably, because in WS2003 the ping command is still retarded and allows only IPv6 with -S parameter. It looks like there is no fix around that, not even a 3rd party solution, and getting ping.exe straight from Windows 7 doesn’t work (Really, I tried that too!)
Then I thought about buying a dual-WAN router, plugging it in between server and ISPs’ routers, and forward specified ports to one LAN connection – It should work, as long as we assume that current problems are caused by the NIC in that server, but that’s pretty easy to sort out, because we have the WWW server on the same connection, and its uptime is almost 100% for the last year (excluding not-company related failures), but even if I set failover on that interface, the clients’ application still uses IP to connect, so it will still report unresponsive to the client, but we would have the main problem fixed.
The third option is doing some magic to the Load Balancing options, using some other software, but AFAIK, it never worked good on windows (and freezing adapters are quite common for me)
There is also fourth option, which is nuking everything and getting a new server and/or system, but here the biggest problem would be license – we are hosting SQL Server DB behind it, server backend which communicates with that DB and returns tables for customers on the web AND our employees (about 200+) who use our company program to communicate with the databases. This obviously needs separate CAL licenses, and W2003 was the last one which didn’t need that in Web Edition.
Can anyone point me in the right direction?
Related: