Extract text from Excel cell AFTER an instance of a character

If you need to extract a certain part of text, but the unique item is not consistent and possibly repeats itself you need to make a few adjustments to the formula.

So in the list of paths below, we want to extract the last part, which occurs after the LAST ‘\’. The problem is that there is no Excel formula that looks for the last time Excel sees something within a cell.

_My Documents \ Other \ Corporate Audit
_My Documents \ Other
_My Documents \ Bank \ Bank \ ABC \ DEF \ Report Items

The steps to follow would be as follows:

How many ‘\’ are there?

First, in each cell we need to work out how many ‘\’ there are if they are inconsistent (if there is exactly the same number you can skip this step).

Again, Excel doesn’t have a formula that does this but you can make use of a combination of the LEN and SUBSTITUTE functions.

The mechanics we use are

  • we determine how many characters are in the cell as is, and
  • then substitute the character we want (‘\’ in this case) with nothing and
  • count how many characters are now in the cell.

The difference between the two numbers is the number of ‘\’s there are.

So if the text

_My Documents \ Other \ Corporate Audit

is in cell A4 then the formula would be

=LEN(A4)-LEN(SUBSTITUTE(A4,”\”,””))

giving an answer of 2.

Make the key character unique

Now that you know how many key characters there are we need to tell Excel to extract from that last key character onwards. The problem is that this is difficult as if you search for the ‘\’ it will find the first one and you aren’t sure how many there may be.

So we need to convert the key character (the second ‘\’) into something unique, perhaps a ‘|’.

The SUBSTITUTE function is useful here because it is one of the few formula that allow you to specify an instance of a character to work on. So if we use this formula

=SUBSTITUTE(A4,”\”,”|”,B4)

we will see the following in the cell (note the second ‘\’ is changed to the unique ‘|’)

_My Documents \ Other | Corporate Audit .

Now extract everything after the unique character

From here it is relatively easy.

Use the FIND function to find which cell contains the unique character ‘|’ i.e.

=FIND(“|”,C4)

and it will give the answer of 23 (the | appears at character 23)

Then use the MID function to pull out everything after this character.

=MID(A4,D4+1,100)

Note that we have included a +1 as we don’t want to see the ‘\’. The 100 is just a big number to take everything afterwards but you can play with the formula to make this more precise.

 

To find out more about these type of issues and to get Example Files, look at out Data Cleanup course