Understanding Various AP-IOS Flash Corruption Issues

 Introduction

This document describes flash corruption problems reported on IOS based Cisco Access Points(AP).

Requirements

Cisco recommends that you have basic knowledge of:

  • AireOS WLCs
  • Lightweight APs

 

Components Used

  • Cisco Aironet 1040, 1140, 1250, 1260, 1600, 1700, 2600, 2700, 3500, 3600, 3700, 700, AP801, and AP802 Series indoor access points
  • Cisco Aironet 1520 (1522, 1524), 1530, 1550 (1552), 1570, and Industrial Wireless 3700 Series outdoor and industrial wireless access points

Note: Problem is present AireOS 8.0.x to 8.5.x release trains. There is a much higher prevalence in Wave1 AP models like 1700/2700/3700  and 2600/3600  on this issue vs other AP types due to flash HW type.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

Background Information

Wave 1 APs can report no flash access or file system corruption, especially when upgrading the WLC.

The corruption can cause the following status on the AP

  • AP unable to save the configuration
  • AP unable to perform an upgrade
  • AP lose configuration
  • AP stuck in a booting loop
  • AP in ROMMON status –  console to the AP is need to recover the AP

Note: The issue is prelevant during upgrade scenarios where AP’s get hung/stuck, but not neccesarily is limited to WLC upgrade scenarios. AP’s may be working fine, servicing clients, etc, while on this problem state which is not easily detectable.

Symptoms of the problem

Flash Inaccessible

Command: show file system

The size and free space on flash would show as “-” instead of the actual value

Problem Output:

==============================
       Size(b)       Free(b)      Type  Flags  Prefixes				
*            -             -     flash     rw   flash:	
==============================

Normal output:

================================
       Size(b)       Free(b)      Type    Flags  Prefixes				
*     40900608       25282560     flash     rw     flash:
===============================

Command:  show flash

AP cannot display its files

Directory of flash:/
 %Error opening flash:/ (Invalid argument)

Command: show logging

AP can display de following messages.

*Mar  1 00:01:04.159: %LWAPP-3-CLIENTERRORLOG: Save LWAPP Config: error saving config file

.....


*Mar  1 00:01:04.159: %LWAPP-3-CLIENTERRORLOG: Load nvram:/lwapp_ap.cfg config
failed, trying backup...
*Mar  1 00:01:04.159: %LWAPP-3-CLIENTERRORLOG: Load nvram:/lwapp_ap.cfg.bak
config failed...

Command: fsck flash:

AP cannot verify its file system.

AP# fsck flash:
Fsck operation may take a while. Continue? [confirm]

%Error fscking flash: (Device or resource busy)

Note: This can occur in two scenarios: 1. When flash is inaccessible or 2. When the flash is still accessible, yet a file descriptor is leaked. Refer CSCvf28459 . The only workaround is to reload the AP.

 

Caution: When AP is in flash inaccessible state, the outcome of the reload is not always predictable. The AP may come back just fine or it may end up in the ROMMON or reboot loop at which point a console based recovery would be required. Please be prepared with console access to the site when performing this operation.

 

Corrupted Files

AP stuck in the following loop.

Loading "flash:/ap1g1-k9w8-mx.v153_80mr_esc.201603111020/ap1g1-k9w8-mx.v153_80mr_esc.201603111020"
...###############################################################################################
##################################################################################################
##################################################################################################
##################################################################################################
##################################################################################################
####################bad mzip file, unknown zip method?

Note: If You have a WLC with a different code than AP runs, forcing AP to join that WLC can fix the problem

 

Solution

Upgrade WLC as per the following document TAC Recommended AireOS Builds.

Caution: Before Upgrade, complete reading this document :

 

Before Upgrade

In order to identify affected AP on the network and fix them before an upgrade. Follow the next steps:

  • Download wlanpoller tool. (Python script.)
    • This is the Wireless Escalation tool (made by Federico Lovison @flovison)
  • Install the tool. For instructions look follow “README.md” file downloaded with the tool
  • Run script. For instructions look follow “README.md” file downloaded with the tool.

Tool Description

Every time the script is run it verifies whether an AP’s flash is accessible or not.

If it is accessible, it runs the command fsck flash:

  • If all is OK, move on to the next AP
  • else repeat the command up to 4 times

if it is inaccessible or command fsck flash is not successful after four times.

  • the script will flag AP on its final report in order to recover on the third run.

The script needs to be run three times.

  1. Run
    • The script will build MD5 database looking at MD5 checksum value for every file on the AP. The final MD5 value for a specific file is the one that has the more hits across same AP family on WLC.
  2. Run
    • The script starts comparing MD5 checksum values vs its database. If value matches then files is ok, if not then AP is flaged in order to recover on the third run
  3. Run
    • Enable recover flag on the configuration file.
    • The script triggers command test capwap image capwap only on the APs where the flash is accessible but some errors were found either fsck flash command failed  after 4 times and/or MD5 mismatch

Note: This recovery method causes the AP to reload once the image is downloaded and installed. Make sure you run it in a maintenance window.

 Tool Output

File: <timestamp>_ap_fs.csv – Summary of the checks executed on APs and their results

Columns description

  • ap_name: Name of the AP.
  • ap_type: AP model.
  • ap_uptime: Uptime for the AP (days).
  • ap_ios_ver: IOS version.
  • fs_free_bytes: Number of free bytes in flash file system.
  • flash_issue: True if any flash corruption has been observed.
  • fs_zero_size: True when flash hung has been detected file system showing “-“
  • fsck_fail: True if file system check has failed.
  • fsck_busy: True device or resource busy when doing flash fsck.
  • fsck_recovered: True when an error occurred on fsck but it is fixed in next fsck.
  • fsck_attempts: Number of attemps of fsck to recover the AP (max 4)
  • md5_fail: True when md5 at least one file is different from the stored in database.
  • rcv_trigger: True when AP tried to download image from WLC when issue has been detected and recovery has been enabled.

File: <timestamp>_ap_md5.csv   Details of the MD5 checksum values of all files (on all APs)

Columns description

  • ap_name: Name of the AP.
  • ap_type: AP model.
  • ap_uptime: Uptime for the AP (days).
  • filename: IOS image file name.
  • md5_hash: md5 value for filename.
  • is_good: True md5 value matches with value stored in db. False md5 mismatch observed for this file.
  • is_zero_bytes: True when filename has 0 bytes based on md5checksum so file is incorrect.
  • md5_error: Error message retriving md5 value if it was not possible to get md5 for filename.

Note: There could be scenarios where the WLANPOLLER recovery script is unable to recover certain AP’s and those AP remains flagged as failed in the report. In those scenarios, manual AP recovery by telnet/SSH/console into AP CLI is recommended. Please open TAC SR if you needed assistance on this process.

 

Reference Bug IDs

Bug ID Headline Symptoms Status Fixed releases
CSCuz47559  Error saving configuration file happens on Cisco Wave1 APs Flash hung Fixed 8.0MR5, 8.2MR6, 8.3MR3, 8.4+
CSCvd07423 AP firmware corrupt after power cycle bad mzip file, unknown zip method reboot loop Crash loop Fixed 8.3MR2, 8.5+
CSCvf16302 Flash on lightweight IOS APs gets corrupted AP stuck on rommon Fixed 8.5MR1, 8.3MR3(via CSCvg98786 )
CSCvf28459 Write of the Private File nvram:/lwapp_ap.cfg Failed on compare RCA needed (try = 1) Flash accessible but fsck not working Fixed  8.3MR4,8.2MR7,8.5MR1
CSCvg98786 IOS AP dtls flap issue seen in pre commit sanity Collateral Fixed 8.5MR1, 8.3MR3

 

source : Understanding Various AP-IOS Flash Corruption Issues

Updated:May 9, 2018
Document ID:213317

Geef een reactie