Archive

Archive for the ‘Networking’ Category

Going down the rabbit hole or 802.11r

If there’s one thing that will get under my skin and irritate me to no end, it’s a problem that I can’t figure out or fix, yet it seems I should be able to. Case in point is a situation I was in recently at work. Ever since implementing 802.1x authentication on our Wi-Fi this year, staff and students have been able to sign into the Wi-Fi without needing to know a network key or sign into captive portals or any other methods of connecting. WPA Enterprise was made for exactly that sort of situation, even though it does take quite a bit of work to get set up.

One staff member however could not connect to Wi-Fi no matter what we tried. We knew it wasn’t a username and password issue, but his Samsung Galaxy J1 Ace simply refused to connect. After entering credentials, the phone would simply sit on the Wi-Fi screen saying “Secured, Saved.” For the record, the J1 Ace is not that old a phone, but it came with Android 4.4.4 and has never been updated here in South Africa. Security updates also ceased quite a while ago. There is a ROM that I’ve seen out there for Android 5, but since I didn’t own the phone I didn’t want to take any chances with flashing firmware via Odin and mucking about with something that clearly was never supported here locally.

My first thought around the connectivity issue is that the phone had a buggy 802.1x implementation and couldn’t support PEAP or MSCHAPv2 properly. Connecting the phone to another SSID that didn’t use 802.1x worked fine, which indicated that the hardware was working at least. I gave up eventually and told the staff member that the phone was just too old to connect. He accepted this with good grace thankfully, but it was something that gnawed away at me, wondering what the issue was.

A few weeks ago, a student came to me wanting to connect to the Wi-Fi. Lo and behold, she had the exact same phone and we had the same situation as before. However, now I got frustrated and I was determined to find out what the problem was. Most of my internet searches came back with useless info, but somewhere somehow I came across an article that talked about how Fast BSS Transition had issues with certain phones and hardware. Fast BSS Transition is technically known as 802.11r and in a nutshell it helps devices to roam better on a network, especially in a corporate environment where you might be using a VOIP app that is especially sensitive to latency and delays while the device roams between access points. It’s been a standardised add-on to the Wi-Fi standards for a good few years, so a device from 2015 should have been just fine with supporting it!

Borrowing the staff member’s J1 Ace, I disabled the 802.11r and 802.11k options on the network. His phone connected faster than a speeding bullet it seemed! That adrenaline rush was quite pleasant, as I now finally knew what the issue was. I enabled 802.11k and the phone still behaved, which meant that the culprit was 802.11r. The moment that was enabled, the phone dropped off the network.

My solution to this problem was to clone the network in our Ruckus ZoneDirector, hide the SSID so that it’s not immediately visible and disable 802.11r for this specific SSID. Once completed, the teacher was connected and has been incredibly happy that he too could now enjoy the Wi-Fi connection again.

My theory is that some combination of wireless chip, chip drivers, Android version and potentially the KRACK fix on our network caused the J1 Ace to be unable to connect. It could be that while it does support 802.11r, when the KRACK fix has been installed on your network the phone cannot connect since it hasn’t been patched since before KRACK was revealed and now doesn’t know how to understand the wireless frames on the network post KRACK fix. Since the phone is never going to get any more support, the only answer is to run a network without 802.11r support for these kinds of devices.

It makes me angry that this kind of thing happens with older devices, this is also related to Android itself, but that is a topic for another post entirely.

Advertisements

The road to DMARC’s p=reject

DMARC is sadly one of the more underused tools out there on the internet right now. Built to work on top of the DKIM and SPF standards, DMARC can go a very long way to stopping phishing emails stone cold dead. While SPF tells servers if mail has been sent from a server you control or have authorised and DKIM signs the email using keys only you should have, DMARC tells servers what do when a mail fails either DKIM, SPF or both checks. Mail can be let through, quarantined to the Spam/Junk folder or outright rejected by the recipient server.

Since moving our school’s email over to Office 365 a year ago, I have had a DMARC record in place. I have had the record set to p=none, so that I could monitor the results over the course of time. I use a free account at DMARC Analyzer to check the results and have been keeping an eye on things over the last year. Confident that all mail flow is now working properly from our domain, I recently modified our DMARC record to read “p=reject;pct=5”. Now mail is being rejected 5% of the time if a destination server does checks on mail coming from our domain and the mail fails the SPF and DKIM checks. 5% is a good low starting point, since according to DMARC Analyzer, I have not had any mails completely fail DKIM or SPF checks in a long time. Some mail is being modified by some servers which does alter the alignment of the headers somewhat, but overall it still passes SPF and DKIM checks.

My next goal is to ramp up that 5% to 20% or 30%, before finally removing the pct variable completely and simply leaving the policy as p=reject. Not only will I be stopping any potential phishing incident arising from our school’s domain, I am also being a good net citizen in the fight against spammers.

Of course, this doesn’t help if a PC on my network gets infected and starts sending mail out via Office 365, as then the mail will pass SPF and DKIM checks and will have to rely on being filtered via the normal methods such as Bayesian filtering, anti-malware scans etc. That is the downside of SPF, DKIM and DMARC, they can’t prevent spam from being sent from inside a domain, so domains still need to be vigilant for malware infections, bots etc. At least with the policies in place, one avenue for spammers gets shutdown. As more and more domains come on board, spammers will continue to get squeezed, which is always a good thing.

Categories: Internet, Networking Tags: ,

Flashing all the Firmwares

November 26, 2017 Leave a comment

In the not so distant past, updating an electronic device’s firmware was either impossible or carried a great many risks. In the slightly slower paced world back then, we didn’t complain too much, perhaps because devices shipped with by and large stable firmware that had spent lots of time in development and ended up being quite polished. In today’s break neck paced world, nothing is ever done and devices are often shipped as quickly as possible, with the promise to update the firmware and improve matters as time goes on.

For the manufacturers who adhere to this promise and regularly put out updated firmware, well done! You deserve big kudos for doing so. Sadly this state of affairs is more the exception than the rule. Far too often a device is shipped to market with initial firmware that gets updated maybe once or twice, only to be abandoned by the manufacturer who has moved onto the next bright and shiny gadget. The most obvious example of this is the mess that most Android based phones have gotten themselves into.

Sometimes firmware just operates low level hardware like the control board on a DVD burner for example. Other times it’s both that and a user interface/operating system all rolled into one – think of the web interface you use to control a home router for example. Sometimes the update just fixes bugs and adds stability, other times it does that and adds new features or updates the user interface – think updated PC UEFI or 3rd party router firmwares.

I promise there’s a point to all this rambling. The recent school holidays afforded me a chance to update firmware on a whole range of devices in my school. Network switches, ADSL routers, CCTV Cameras and the attached NVR as well as a few other odds and ends. HP deserves a special shout out here for their lengthy firmware life for older model switches. Whilst they had no reason to do, HP did keep updating the firmware of certain switches for a good number of years, which at least extended the useful life of these devices.

Sadly on the old 2610 series HP didn’t remove the Java based web interface, but the last available firmware did at least sign the binaries so that there’s one less warning when you connect. For the 2620 series, HP back ported the new UI from their modern Aruba switches, which has lead to a nice consistent interface across 3 different switch generations we own. If you’ve ever used HP’s legacy interface on the 2620 and other similar generation models, you’ll know how ugly and painful that interface was to use.

The Dahua CCTV system we used though was another story. For one thing, the fixed bullet camera we were sold appeared to have been very quickly replaced by Dahua, so there’s no new firmware beyond 2015. The fixed dome cameras did better however, with a firmware from only a few months ago. The NVR also had a much later firmware available. I flashed the NVR first, only for all hell to break out after the reboot. A large portion of the cameras refused to connect to the device after the update. Whilst most of the settings seemed to have been preserved during the update, too many little things seemed to have been disturbed. The next thing I did was to update the dome cameras one by one. When that still didn’t help, I deleted and re-added all the cameras to the NVR. To my relief, this sorted the problem and we were able to go back to using the system.

That being said, there has been some cases of the cameras displaying corrupted green screens, though that hasn’t lasted long and only seemed to be affecting 1-2 cameras. Those devices might just need to be flashed again for proper stability, but it’s still not how it’s supposed to be. Alternatively, I will check for the next available update and flash that to the cameras, hoping it solves the problem on those cameras with issues.

I still have my main server’s firmware to flash which I plan to do in the next school holidays. Intel has discontinued the S2600GZ system, but at least they also still make firmware available. That system is unlikely to get any more updates in the future, but at least it had a decent lifespan.

My suggestion when it comes to firmware updating is to flash everything you have with the latest available firmware, unless it is a completely critical core device that you cannot have any downtime or potential problems. Rather safe than sorry and sometimes an update is the only way to fix things. There’s also the option of 3rd party firmware on some devices, but that’s a whole different post.

DHCP Relay: the basics

November 26, 2017 Leave a comment

If you run a small flat network, DHCP just magically works once it is set up. Devices get their addresses, devices communicate, everything works and everyone is happy. The moment you partition the network with VLANS however, things change. Devices in the additional segment(s) no longer receive DHCP packets. There are 3 options available to rectify this issue:

  1. Manually configure static IP addresses. Painful but will work.
  2. Set up a DHCP server per additional VLAN. Lot of duplicated work and if you aren’t careful, DHCP packets can end up crossing VLANS, causing havoc with devices.
  3. Use DHCP relay to centralise IP address issuing from one central server.
  4. I’ve just recently configured DHCP relay at my school and it’s working well. Getting it set up is a tad tricky, but once you understand how it works, it’s quite straight forward. Here is a guide on how to do it on a network that runs Aruba switches and Windows 2012 R2 DHCP server.

    It should be noted that in order for this to work, you need a core switch that is capable of IP routing. Layer 3 switches will do this, as well some higher end Layer 2 switches from Aruba – the 2530 and 2540 models spring to mind. If you don’t have a routing capable switch in your network, you are going to need a router to be connected to each VLAN to do the job instead. Your VLANS must also be set up correctly with untagged and tagged ports for this to work.

Firstly, decide on the IP ranges you want for your additional VLANS. Try to ensure you have enough space so that you don’t need to redo the scope later on.

Next, create these scopes with all the necessary extra bits in the Windows DHCP management console, but do not activate them when asked at the end of the wizard. Leave them deactivated for the time being.

On your core Aruba switch, assign an IP address to every VLAN that you want to use DHCP relay on. Make sure that this IP matches the range of your DHCP server scope, but that the address doesn’t conflict with something in the range.

Next, enable IP routing on the core switch:

conf t
ip routing
wr mem

Next, add the IP helper address to each VLAN you want to use DHCP on. On the switch’s command line, type the following:

conf t (if starting from scratch, not needed if you are still carrying on from the above step)
vlan 20 ip helper-address 192.168.0.10
wr mem


Substitute VLAN 20 for each additional VLAN ID and 192.168.0.10 for your DHCP server.

On each of your edge switches, do not give the switch an IP in any VLAN except your main or management VLAN that the core switch also resides in. Point each edge switch’s IP default gateway address to the core switch’s IP address.

On your Windows DHCP server, you will need to add some static routes to the server unless its default gateway is pointed to the core switch. Odds are that the server isn’t pointed to the core switch but rather to a firewall for internet access, so the routes will need to be added manually. Open up a command prompt and type the following:

route –p add 172.16.0.0 mask 255.255.254.0 192.168.0.75

Repeat the above command for each VLAN you want DHCP on. Substitute 172.16.0.0 with your own network, mask 255.255.254.0 with the correct subnet mask and 192.168.0.75 with your own core switch IP.

Lastly, activate the scope(s) in the Windows DHCP console. You can test things out by using a client PC in each VLAN and releasing and refreshing the IP address. You should be obtaining an address that is correct for each VLAN and there should be no spill over between the VLANS that will cause network chaos. You should be able to see the clients appearing in the Address Lease section of each DHCP scope.

UEFI booting observations

It’s exam time at my school, which means that things quieten down a bit during said period. This leaves me with some free time during the day to experiment and learn new things, or attempt to do things I have long wanted to do but have not had the time. I’ve used the time during this last week to play around with deploying Windows 7 and 10 to PC’s for the purpose of testing their UEFI capabilities. While Windows 7 can and does UEFI boot, it really doesn’t gain any benefits over BIOS booting, unlike Windows 8 and above. I was more interested in testing out the capabilities of these motherboards, so I could get a clearer idea of hardware issues we may have when we move to Windows 10.

Our network comprises of only Intel based PC’s, so all my experiences so far are based off of that particular point. What I’ve found so far boils down to this:

  • Older generation Intel 5 & 6 chipset series motherboards from Intel themselves are UEFI based, but present interfaces that look very much like the traditional console type BIOS. The only real clue is that under the Boot section, there is an option to turn on UEFI booting.
  • These older motherboards don’t support Secure Boot or the ability to toggle on and off the Compatibility Support Module (CSM) – the UEFI version on these boards predate these functions.
  • I have been unable to to UEFI PXE network boot the 6 series motherboard, haven’t yet tried the 5 series boards. While I can UEFI boot the 6 series to a flash drive/DVD/hard drive, I cannot do so over the network. Selecting the network boot option boots the PC into what is essentially BIOS compatibility mode.
  • The Intel DB75EN motherboard has a graphical UEFI, supports Secure Boot and can toggle the CSM on and off. Interestingly enough though, when the CSM is on, you cannot UEFI PXE boot – the system boots into BIOS compatibility mode. You can only UEFI PXE boot when the CSM is off. This is easy to tell as the network booting interface looks quite different between CSM and UEFI modes.
  • Windows 7 needs the CSM mode turned on for the DB75EN motherboards if you deploy in UEFI mode, so that it can boot, at least from what I’ve found from using PXE boot. If you don’t turn CSM on, the boot will either hang at the Windows logo or will moan about unable to access the correct boot files. I have yet to try and install Windows 7 on these boards from a flash drive in UEFI mode to see what happens in that particular scenario.
  • I haven’t yet had a chance to play with the few Gigabyte branded Intel 8 series motherboards we have. These use Realtek network cards instead of Intel NICs. I’m not a huge fan of Gigabyte’s graphical UEFI, as I find it cluttered and there’s a lot of mouse lag. I haven’t tested a very modern Gigabyte board though, so perhaps they’ve improved their UEFI by now.

UEFI Secure Boot requires that all hardware supports pure UEFI mode and that the CSM be turned off. I can do this with the boards where I’m using the built in Intel Graphics, as these fully support both CSM mode and pure UEFI. Other PC’s with Geforce 610 adapters in them don’t support pure UEFI boot, so I am unable to use Secure Boot on them, which is somewhat annoying, as Secure Boot is good for security purposes. I am probably going to need to start making use of low end Geforce 700 series cards, as these support full UEFI mode, so will support Secure Boot as well.

It’s been a while since we bought brand new computers, but I will have to be more picky when choosing the motherboards. Intel is out of the motherboard game and I am not a fan of Realtek network cards either – this does narrow my choices quite a bit, especially as I also have to be budget conscious. At least I know that future boards will be a lot better behaved with their UEFI, as all vendors have had many years now to adjust to the new and modern way of doing things.

The long hunt for a cure

At the end of March 2014, our school took ownership of a new Intel 2600GZ server to replace our previous HP ML350 G5 server which was the heart of our network. The HP had done a fantastic job over the years, but was rapidly starting to age and wasn’t officially supported by Windows Server 2012 R2. Our new server has 32GB of RAM, dual Xeon processers, dual power supplies, 4 network ports and a dedicated remote management card. Although a little pricier than what I had originally budgeted for, it matched what the HP had and would earn its keep over the next 5-7 years worth of service.

After racking and powering up the server, I installed firmware updates and then Server 2012 R2. Install was quicker than any other server I’ve done in the past, thanks to the SSD boot volume. After going through all the driver installs, Windows Updates and so on, the server was almost ready to start serving. One of the last actions I did was to bond all 4 network ports together to create a network team. My thinking was that having a 4Gb/s team would prevent any bottlenecks to the server when under heavy load, as well as provide redundancy should a cable or switch port go faulty. Good idea in theory, but in reality I’ve never had a cable or port in the server room go bad in 6+ years.

Looking back now, I’m not sure exactly why I bothered creating a team. While the server is heavily used as a domain controller, DHCP, DNS and file server, it never comes close to saturating 1Gb/s, let alone 4. Almost every computer in the school is still connected at 100Mb/s, so the server itself never really comes under too much strain.

Either way, once everything was set up, I proceeded to copy all the files across from the old HP to the new Intel server. I used Robocopy to bulk move files, and in some cases needed to let the process finish up over night since there were so many files, especially lots of small files. Data deduplication was turned on, shares were shared and everything looked good to go.

When school resumed after the holidays, the biggest problem came to light right on the first morning: users being unable to simultaneously access Office files. We have a PowerPoint slideshow that is run every morning in the register period that has all the daily notices for meetings, events, reminders, detention etc. Prior to the move, this system worked without fault for many years. After the move, the moment the 2nd or 3rd teacher tried to access the slideshow, they would get this result:

WP_20140409_001
Green bar of doom crawling across the navigation pane, while this odd Downloading box would appear and take forever to do anything and would tend to lock Explorer up. Complaints naturally came in thick and fast and the worst part is that I couldn’t pinpoint what the issue was, aside from my suspicion that the new SMB3 protocol was to blame. I had hoped that the big Update 1 update that shipped for Windows 8.1 and Server would help, but it didn’t. Disabling SMB signing didn’t help either. At one point, my colleague and I even installed Windows 8.1 and Office 2013 on some test machines to try and rule out that possibility, but they ended up doing the same thing. As a stop gap measure, I made a dedicated Notices drive on the old HP, which was still running Server 2008, which ran fine with concurrent access to the same file. Online forums weren’t any real help and none of the other admins in Cape Town I spoke to had encountered the problem either.

In the last school holidays just gone by, we finally had a decent gap between other jobs to experiment on the new server and see if we could correct the problem. I broke the network team, unplugged 3 of the 4 cables and disabled the LACP protocol on the switch. After reassigning the correct IP to the now single network port, we did some tests on opening up files on 2 and then 3 computers at the same time. We opened up 15MB Word documents, 5MB complicated Excel files, 200MB video files and more. The downloading box never showed up once. Unfortunately, without heavier real world testing by the staff, I don’t know if the problem has been resolved once and for all. I am intending to move the Notices drive during the next school holiday and we will see what happens after that.

Chalk one up for strange issues that are almost impossible to hunt down.

Bandwidth buffet

For any person now in their late 20’s to early 30’s, the sound of a dial up modem should be a familiar, yet fading memory. Connecting to the internet more often than not meant listening to those squawking devices and hoping that the connection was clean and free of noise that would slow the connection down. As time went on, ADSL arrived, 3G arrived, HSDPA arrive, VDSL and now fibre have arrived. Slowly but surely, the trickle of data coming down the pipes has become a raging torrent, at least for those that can afford the higher speed options.

Driven by the laws of supply and demand, websites have evolved to become a lot more feature rich and complex than in years gone by. Images have gone from highly compressed gif and jpeg files into higher resolution jpeg and png files. Internet video, once an incredibly frustrating experience involving downloadable clips and QuickTime, Real or Windows Media Player has largely evolved into slick, easy to use web based players. Of course, video has also crept up from thumbnail size resolutions all the way up to the current 4K. It’s become really simple: the bigger your pipe, the more you are going to drink from the fountain by consuming rich media, online streaming and more. Creating and uploading content is also more viable than ever before, which means symmetrical connections are becoming far more important to the end user than they ever were before.

Take the following picture below, taken from my school’s firewall logs for 1 January – 30 April 2015:

Stats

That’s a total of 970GB of traffic on 1 ADSL line, the speed of which has fluctuated during the course of the year due to stability issues. We have another ADSL line which is only used by a few people, some phones and tablets, but I don’t have usage stats for that line. However, taken all together, our school has definitely used over 1TB of data in 4 months. At this rate, we may end up pushing close to 3TB by the year’s end. Also keep in note that these stats are without any wide scale WiFi available to students. I shudder to think of what the numbers will be once we have WiFi going, or even if we get a faster upload so that things like Dropbox, OneDrive and so on become viable.

Here’s a second interesting picture as well:

Stats2

Of all the web traffic going through the firewall, 82.5% of the traffic was unique or uncacheable due to its dynamic nature. In earlier days, caching statistics were higher since websites were less dynamic and had far more static HTML code, less scripts etc. That being said, the cache did at least manage to serve about 15% of the total web traffic. Every little bit helps when you are on a slow connection.

In the end, it all goes to show that the more bandwidth you have, the more applications and users are going to end up making use of it. Thankfully, bandwidth prices are much lower than they have ever been, though on some connections the speed is throttled to make sure that the end user doesn’t gorge themselves to the detriment of other users.

Categories: Networking Tags: