System Status
No scheduled downtime
This month we are upgrading the labs to Debian v5. With this move we are also changing the home directories to avoid problems with current user profiles. We'll send a more detailed email describing the whole thing, but be prepared to re-do some of your profile work after the upgrade.
Due to the storms from yesterday we went offline in the afternoon. The network was available but the primary file server suffered disk failures that kept it from coming online automatically. We have now cleared the errors and don't expect any further issues.
We are in the process of changing the IP addresses used when contacting the CS systems from the Internet. As a result you will likely get SSH warnings when connecting to our servers. Make sure you understand what they are and look at the fingerprints.
cn01's fingerprint is: ac:bd:cb:c7:9d:ea:41:8e:17:78:aa:9c:9e:b5:64:6a cn02's fingerprint is: 56:9c:eb:19:60:79:19:9a:a0:fa:a7:f1:e9:2e:9b:a1
We also may experience a couple of periods during which you may not be able to access our servers. We'll try to keep these periods to less than 10 minutes, so we can't provide specific schedules, but they will be during this 6 days period.
We had a switch go bad that affected the ACL lab. All thin clients were connected to it. We have now replaced the unit and will be making network changes to minimize the impact of a single switch failure.
We had a file system problem that affected the user home directories tree. As a consequence login into the CS and Bioinfomatics labs was not properly working. As a precaution we took all file systems offline and tested them. Everything is checking out clean as of 14:45 CST. If you run into any issues or need data restored from a backup set please let us know.
Parts of the campus network had intermitent issues during these two days. We don't have all the details, but we've been told that all access has been restored as of Tuesday morning. The CS network was unreachable at times due to these problems.
12:10AM We are having problems with cn01. We are investigating the problem but don't have an estimate on when it will be fixed. For now please log in using cn02.
2:15AM We have resolved the issues and the server is back online. We had to power cycle the system and all sessions will need to be restarted.
Recently, we have been maintaining the web servers manually doing selective security updates. Due to the unexpected delays on getting new servers online, and with the increased workload on our sysadmins, we decided to bring the web servers fully to the current Debian distribution "Etch". Most of the important packages were already current, so we don't expect many problems, however test any software that you were running out of the main www.cs... site and let us know if you are having any problems.
We experienced hardware problems at the physical plant which resulted on the CS network getting disconnected from the rest of the university/internet. The problem was resolved within 3 hours and we were back online by 1:30PM.
We are upgrading our base OS to Ubuntu 8.04. As of today cn01, and all servers handling the thin clients in the CS and bioinfomatics labs have been upgraded. cn02 will be handled later in the semester as we plan on moving some of the services it is currently providing to the new virtual infrastructure we are working on.
The whole network will be offline on Thursday May 1st between 3PM and 7PM. We are replacing faulty hardware. The outage will probably last 1 hour. Upgrade finished.
The server is being restaged for use within the database class. This server will no longer be available to normal computing jobs for the rest of the semester.
Due to unexpected failures within the memory modules in the file server we decided to replace them. The system went down due to ECC errors at 3AM, this caused most systems to be unavailable this morning.
Due to unexpected hardware and configuration problems today's scheduled upgrades finished at 20:00 instead of 18:00 as planned. Only the web server missed the target as it was the last server to be upgraded. We apologize for any inconvenience the extended outage might of caused.
Today we are doing the planned OS system updates on the database servers. This affects all software that uses MySQL and PostgreSQL backends, including some of the web applications. Updates are scheduled to finish by 21:00 CST although some services may be available sooner.
20:45 Update: All upgrades have been completed.
The AC that keeps the main infrastructure for the CS department failed early December 24th. Equipment reached their over temp limits around 1PM, which caused a complete shutdown of all networking equipment. Due to the schedule during the holidays we are not sure when proper repairs will be finalized. As of today, a temporary solution is in place that will provide limited access to our infrastructure. In particular, remote services via SSH and all web access has been restored, but no physical access to the WOB lab will be authorized until further notice. We regret the inconvenience that this may cause.
18:00 Update: The AC has been fixed and we are not expecting any complications with the repairs. Therefore all access has been restored and authorized usage is back to normal with the current access policies.
The archive server is having problems with some of its hardware. We are working on replacing the failing hardware, but are waiting on parts from the vendor. In the mean time, expects some periods during which the archive might not operate full speed or be reachable.
The backbone fiber connecting the CS network to the campus network is being replaced this Saturday. All systems will be unreachable during this time. Local access via the ACL lab will continue to be available.
The server s26037 had a hard crash early this morning. The server had to be power cycled and everything is back online. During several hours this affected the main CS website and slowed down remote SSH access.
After the recovery of the lab servers, the CS profile was not getting executed properly when the thin clients were used. The problem has now been fixed and all SEU specific commands and applications should be working again.
Although the test server didn't have any problems, we couldn't reliably run the new version of Debian on all the servers. After getting online only 2 of the 5 servers we decided to go back to the old version.
In the process, the Bioinfomatics lab has been changed to different thin clients. The original server is now being reserved to work out some of the issues found while upgrading the system.
We are in the process of upgrading all servers to the latest version of Debian. Today we are starting the upgrade on all lab servers. This includes the ACL lab and the Bioinfomatics lab. If you encounter any problems with any of your applications no longer working please let us know.
We have upgraded the OS in the main file server. This should fix the spurious errors we have seen with XFS and NFS.
We hit an XFS bug that caused the main file system to halt. The problem lasted from 13:42 to 16:24, when the server was finally brought back online. The only data that could of been lost is whatever changes where made on files that were not flushed to disk prior to the crash. If you had any files being modified around 13:42 we encourage that you check them for correctness and let us know if you find any issues. We do have backups from earlier that day that we may use to see if we find any data loss on a per file basis.
We have upgraded from PHP4 to PHP5 on all of our web servers.
We had a glitch on the XFS kernel code, that caused the file server to be unavailable for about 10 minutes at 3:45 PM. The problem has been resolved and we don't expect any further issues.
We have completed the equipment move for all the hardware that is onsite. There is still a UPS that needs to be purchased and some servers will be affected once we move them to it.
On June 27th, we had a problem with the main router that connects the CS network with the university. As a result we lost connectivity between 11:20 AM and 2:00 PM.
On May 10th, we are moving our network equipment out of Fleck Hall into WOB. During the move the whole CS network will be affected and non-reachable. Right now the move is scheduled for 12 PM - 5 PM CST
Update: the network is back online. Some configuration changes are pending and will be done within the next 48 hours. Plan for some small interruptions lasting less than 2 minutes each.
It just came to our attention that some of the servers didn't have access to all of the network file systems. We are still investigating when the problem started. It is likely that it was caused during the power failures due to the thunder storms.
All affected servers have now been remapped and no further problems are expected.
Yesterday's system recovery involved an outdated authentication database. We have restored a more recent version from backup. If you are having problems loggin into the system, try your previous password. If you still have problems, let a faculty member know so that they can reset your password.
Due to a prolonged power failure caused by the storm the UPS batteries run out. Unfortunately the servers didn't properly retart the NFS services. The situation has now been resolved.
We will be performing infrastructure maintenance on December 15, 2005. This will effectively bring everything down with the exception of the archive server. The work is expected to start around 4 PM CST and will last for several hours.
The hardware replacements are now in place. The system is 100% operational.
While moving files across systems the file server hit a kernel bug in the NFS code. This caused the server to become unavailable for about 15 minutes around 15:20.
The disk upgrade to the archive server has been finished.
We are currently adding more space to the archive server. Once the upgrades are done we'll have almost 2 TB of RAID space available. In the mean time the archive files will be offline.
The authentication server became unavailable between 19:00 and 19:20. Users starting sessions were not able to log in. This problem is now resolved and we don't expect any further issues.
Sunday, we experienced a problem with our uplink network cabling. Due to restricted access to the wiring closets, central IT couldn't fix the problem until Monday morning.
Servers s26038 - s26041 will have their software upgraded, expect some issues with libraries. s26038 will need to be brought offline. s26039 - s26041 will remain online during the upgrade and only performance degradation and compiling issues are expected. These problems will go away once the upgrade is completed.
Due to changes on the NFS server holding the home directories the server had to be rebooted. No users seemed to be logged in at the time, however any processes running on the computing nodes may have a 5 minute period where they couldn't access files. If you had running processes you may want to check them.
Classes start on January 18 and the CS labs will officially open. As a reminder the ACL has the same hours or operation as the library. The schedule can be found at the library pages.
Server s26038 will be re-staged on January 13, any files stored in the local file systems will be lost. Also no computing jobs should be started that won't be finished by then.
Ever wanted to learn more about Emacs? This is your chance. A workshop will take place at 16:45 at the ACL. Learn some of the basic commands, moving around buffers and some advanced techniques that will help you do homework faster.
The maintenance on the main file server was completed. As a result the file system containing all users' home directories has been increased by 41%.
The CS Unix systems will be offline starting 2004/11/03 @ 00:15 to do maintenance on the main file server.
While the ACL has technically existed for a few months, it officially opens on August 23 with the start of the Fall Semester.