Introduction
This document describes flash corruption problems reported on IOS based Cisco Access Points(AP).
Requirements
Cisco recommends that you have basic knowledge of:
- AireOS WLCs
- Lightweight APs
Components Used
- Cisco Aironet 1040, 1140, 1250, 1260, 1600, 1700, 2600, 2700, 3500, 3600, 3700, 700, AP801, and AP802 Series indoor access points
- Cisco Aironet 1520 (1522, 1524), 1530, 1550 (1552), 1570, and Industrial Wireless 3700 Series outdoor and industrial wireless access points
Note: Problem is present AireOS 8.0.x to 8.5.x release trains. There is a much higher prevalence in Wave1 AP models like 1700/2700/3700 and 2600/3600 on this issue vs other AP types due to flash HW type.
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Background Information
Wave 1 APs can report no flash access or file system corruption, especially when upgrading the WLC.
The corruption can cause the following status on the AP
- AP unable to save the configuration
- AP unable to perform an upgrade
- AP lose configuration
- AP stuck in a booting loop
- AP in ROMMON status – console to the AP is need to recover the AP
Note: The issue is prelevant during upgrade scenarios where AP’s get hung/stuck, but not neccesarily is limited to WLC upgrade scenarios. AP’s may be working fine, servicing clients, etc, while on this problem state which is not easily detectable.
Symptoms of the problem
Flash Inaccessible
Command: show file system
The size and free space on flash would show as “-” instead of the actual value
Problem Output:
==============================
Size(b) Free(b) Type Flags Prefixes
* - - flash rw flash:
==============================
Normal output:
================================
Size(b) Free(b) Type Flags Prefixes
* 40900608 25282560 flash rw flash:
===============================
Command: show flash
AP cannot display its files
Directory of flash:/
%Error opening flash:/ (Invalid argument)
Command: show logging
AP can display de following messages.
*Mar 1 00:01:04.159: %LWAPP-3-CLIENTERRORLOG: Save LWAPP Config: error saving config file
.....
*Mar 1 00:01:04.159: %LWAPP-3-CLIENTERRORLOG: Load nvram:/lwapp_ap.cfg config
failed, trying backup...
*Mar 1 00:01:04.159: %LWAPP-3-CLIENTERRORLOG: Load nvram:/lwapp_ap.cfg.bak
config failed...
Command: fsck flash:
AP cannot verify its file system.
AP# fsck flash:
Fsck operation may take a while. Continue? [confirm]
%Error fscking flash: (Device or resource busy)
Note: This can occur in two scenarios: 1. When flash is inaccessible or 2. When the flash is still accessible, yet a file descriptor is leaked. Refer CSCvf28459 . The only workaround is to reload the AP.
Caution: When AP is in flash inaccessible state, the outcome of the reload is not always predictable. The AP may come back just fine or it may end up in the ROMMON or reboot loop at which point a console based recovery would be required. Please be prepared with console access to the site when performing this operation.
Corrupted Files
AP stuck in the following loop.
Loading "flash:/ap1g1-k9w8-mx.v153_80mr_esc.201603111020/ap1g1-k9w8-mx.v153_80mr_esc.201603111020"
...###############################################################################################
##################################################################################################
##################################################################################################
##################################################################################################
##################################################################################################
####################bad mzip file, unknown zip method?
Note: If You have a WLC with a different code than AP runs, forcing AP to join that WLC can fix the problem
Solution
Upgrade WLC as per the following document TAC Recommended AireOS Builds.
Caution: Before Upgrade, complete reading this document :
Before Upgrade
In order to identify affected AP on the network and fix them before an upgrade. Follow the next steps:
- Download wlanpoller tool. (Python script.)
- This is the Wireless Escalation tool (made by Federico Lovison @flovison)
- Install the tool. For instructions look follow “README.md” file downloaded with the tool
- Run script. For instructions look follow “README.md” file downloaded with the tool.
Tool Description
Every time the script is run it verifies whether an AP’s flash is accessible or not.
If it is accessible, it runs the command fsck flash:
- If all is OK, move on to the next AP
- else repeat the command up to 4 times
if it is inaccessible or command fsck flash is not successful after four times.
- the script will flag AP on its final report in order to recover on the third run.
The script needs to be run three times.
- Run
- The script will build MD5 database looking at MD5 checksum value for every file on the AP. The final MD5 value for a specific file is the one that has the more hits across same AP family on WLC.
- Run
- The script starts comparing MD5 checksum values vs its database. If value matches then files is ok, if not then AP is flaged in order to recover on the third run
- Run
- Enable recover flag on the configuration file.
- The script triggers command test capwap image capwap only on the APs where the flash is accessible but some errors were found either fsck flash command failed after 4 times and/or MD5 mismatch
Note: This recovery method causes the AP to reload once the image is downloaded and installed. Make sure you run it in a maintenance window.
Tool Output
File: <timestamp>_ap_fs.csv – Summary of the checks executed on APs and their results
Columns description
- ap_name: Name of the AP.
- ap_type: AP model.
- ap_uptime: Uptime for the AP (days).
- ap_ios_ver: IOS version.
- fs_free_bytes: Number of free bytes in flash file system.
- flash_issue: True if any flash corruption has been observed.
- fs_zero_size: True when flash hung has been detected file system showing “-“
- fsck_fail: True if file system check has failed.
- fsck_busy: True device or resource busy when doing flash fsck.
- fsck_recovered: True when an error occurred on fsck but it is fixed in next fsck.
- fsck_attempts: Number of attemps of fsck to recover the AP (max 4)
- md5_fail: True when md5 at least one file is different from the stored in database.
- rcv_trigger: True when AP tried to download image from WLC when issue has been detected and recovery has been enabled.
File: <timestamp>_ap_md5.csv Details of the MD5 checksum values of all files (on all APs)
Columns description
- ap_name: Name of the AP.
- ap_type: AP model.
- ap_uptime: Uptime for the AP (days).
- filename: IOS image file name.
- md5_hash: md5 value for filename.
- is_good: True md5 value matches with value stored in db. False md5 mismatch observed for this file.
- is_zero_bytes: True when filename has 0 bytes based on md5checksum so file is incorrect.
- md5_error: Error message retriving md5 value if it was not possible to get md5 for filename.
Note: There could be scenarios where the WLANPOLLER recovery script is unable to recover certain AP’s and those AP remains flagged as failed in the report. In those scenarios, manual AP recovery by telnet/SSH/console into AP CLI is recommended. Please open TAC SR if you needed assistance on this process.
Reference Bug IDs
Bug ID | Headline | Symptoms | Status | Fixed releases |
---|---|---|---|---|
CSCuz47559 | Error saving configuration file happens on Cisco Wave1 APs | Flash hung | Fixed | 8.0MR5, 8.2MR6, 8.3MR3, 8.4+ |
CSCvd07423 | AP firmware corrupt after power cycle bad mzip file, unknown zip method reboot loop | Crash loop | Fixed | 8.3MR2, 8.5+ |
CSCvf16302 | Flash on lightweight IOS APs gets corrupted | AP stuck on rommon | Fixed | 8.5MR1, 8.3MR3(via CSCvg98786 ) |
CSCvf28459 | Write of the Private File nvram:/lwapp_ap.cfg Failed on compare RCA needed (try = 1) | Flash accessible but fsck not working | Fixed | 8.3MR4,8.2MR7,8.5MR1 |
CSCvg98786 | IOS AP dtls flap issue seen in pre commit sanity | Collateral | Fixed | 8.5MR1, 8.3MR3 |
source : Understanding Various AP-IOS Flash Corruption Issues