bash string manipulation

Abstract

#!/bin/bash
 
# substring extraction and replacement
str="2023-10-12"
echo "${str:5:2}" # 10
echo "${str::4}" # 2023
echo "2022-${str:5}" # 2022-10-12
 
str="backup.sql"  
echo "original${str:(-4)}" # original.sql
 
str="obin-linux_x64_bin"
echo "${str/x64/armhf}" # obin-linux_armhf_bin
echo "${str/bin/dist}" # odist-linux_x64_bin
echo "${str//bin/dist}" # odist-linux_x64_dist
 
str="db_config_backup.zip"
echo "${str/%.zip/.conf}" # db_config_backup.conf
echo "${str/#db/settings}" # settings_config_backup.zip
 
str="db_config_backup.zip"
echo "${str/%.*/.bak}" # db_config_backup.conf
echo "${str/#*_/new}" # newbackup.zip
 
# regex
str="db_backup_2003.zip"  
if [[ $str =~ 200[0-5]+ ]]; then  
    echo "regex_matched"  
fi
[[ $str =~ 200[0-5]+ ]] && echo "regex_matched"
 
str="db_backup_2003.zip"
if [[ $str =~ (200[0-5])(.*)$ ]]; then
    echo "${BASH_REMATCH[0]}" # 2003.zip
    echo "${BASH_REMATCH[1]}" # 2003
    echo "${BASH_REMATCH[2]}" # .zip
fi
 
str="db_backup_2003.zip"
re="200[0-3].zip"
echo "${str/$re/new}.bak" # db_backup_new.bak
 
# substring removal
str="ver5.02-2224.e2"
ver="${str#ver}"
echo $ver # 5.02-2224.e2
maj="${ver/.*}"
echo $maj # 5
 
str="ver5.02-2224_release"
ver="${str//[a-z_]}"
echo $ver # 5.02-2224
 
# case conversion
str="Hello Bash!"  
lower="${str,,}"  
upper="${str^^}"  
echo $lower # hello bash!  
echo $upper # HELLO BASH!
 
ver1="V2.0-release"
ver2="v4.0-release"
echo "${ver1,}" # v2.0-release
echo "${ver2^}" # V4.0-release
 
declare -l ver1
declare -u ver2
ver1="V4.02.2"
ver2="v2.22.1"
echo $ver1 # v4.02.2
echo $ver2 #V2.22.1
 
# string to array
str="C,C++,JavaScript,Python,Bash"
IFS=',' read -ra arr <<< "$str"
echo "${#arr[@]}" # 5
echo "${arr[0]}" # C
echo "${arr[4]}" # Bash

Reviews

2024-07-11

Lots of tips on String manipulation using bash. I may only be using one or two of them, especially converting a file extension, e.g. ${filename/.json/.yaml}. It’s quite powerful!

Bash became every Unix-like or Unix-based operating system’s default automation language. Every system administrator, DevOps engineer, and programmer typically uses Bash to write shell scripts with repetitive command sequences. Bash scripts typically contain commands that run other program binaries. In most scenarios, we may have to process data and create a logical flow within the shell script. So, we often have to add conditional and text manipulation statements in our shell scripts.

Traditional Bash scripts and past programmers who used older Bash interpreter versions typically used awk, sed, tr, and cut commands for text manipulation. These are separate programs. Even though these text processing programs offer good features, they slow down your Bash script since each particular command has a considerable process spawn-up time. Modern Bash versions offer inbuilt text processing features via the well-known parameter expansion feature.

In this story, I’ll explain some inbuilt string manipulation syntaxes you can use to process text productively in your Bash scripts.

Substring Extraction and Replacement

A substring refers to a contagious segment or a part of a particular string. In various scripting scenarios, we need to extract substrings from string segments. For example, you may need to get only the filename segment from a complete filename that consists of an extension. Also, you may need to replace substrings with specific string segments (i.e., changing the file extension of a filename).

Substring extraction is so easy by providing character position and length:

#!/bin/bash
 
str="2023-10-12"
 
echo "${str:5:2}" # 10
echo "${str::4}" # 2023
echo "2022-${str:5}" # 2022-10-12

You can even do substring calculations from the right side, as follows:

#!/bin/bash
 
str="backup.sql"
 
echo "original${str:(-4)}" # original.sql

Bash also offers a productive inbuilt syntax for substring replacements:

#!/bin/bash
 
str="obin-linux_x64_bin"
 
echo "${str/x64/armhf}" # obin-linux_armhf_bin
echo "${str/bin/dist}" # odist-linux_x64_bin
echo "${str//bin/dist}" # odist-linux_x64_dist

When you work with some strings, such as filenames, paths, etc., you may have to replace string prefixes and suffixes. Replacing a file extension with another extension is a good example. Look at the following example:

#!/bin/bash
 
str="db_config_backup.zip"
 
echo "${str/%.zip/.conf}" # db_config_backup.conf
echo "${str/#db/settings}" # settings_config_backup.zip

In the above substring replacement examples, we used the exact substring segment for matching, but you can also use a part of the substring by using the * wildcard character as follows:

#!/bin/bash
 
str="db_config_backup.zip"
 
echo "${str/%.*/.bak}" # db_config_backup.conf
echo "${str/#*_/new}" # newbackup.zip

The above approach is helpful if you don’t know the exact substring to search.

Regex Matches, Extractions, and Replacements

As many Unix or GNU/Linux users already know, it’s possible to use grep and sed for regular expressions-based text searching. sed helps us to do regex replacements. You can use inbuilt Bash regex features to handle text processing faster than these external binaries.

You can perform a regex match with an if-condition and the ”=~” operator, as shown in the following code snippet:

#!/bin/bash
 
str="db_backup_2003.zip"
 
if [[ $str =~ 200[0-5]+ ]]; then
    echo "regex_matched"
fi

You can also replace the if-statement with an inline conditional if you want:

[[ $str =~ 200[0-5]+ ]] && echo "regex_matched"

Once the Bash interpreter performs a regex match, it typically stores all matches in the BASH_REMATCH shell variable. This variable is a read-only array, and it stores the entire matched data in the first index. If you use sub-patterns, Bash incrementally keeps those matches in other indexes:

#!/bin/bash
 
str="db_backup_2003.zip"
 
if [[ $str =~ (200[0-5])(.*)$ ]]; then
    echo "${BASH_REMATCH[0]}" # 2003.zip
    echo "${BASH_REMATCH[1]}" # 2003
    echo "${BASH_REMATCH[2]}" # .zip
fi

Remember we used wildcards with previous substring matching? Similarly, it’s possible to use regex definitions inside parameter expansions, as shown in the following example:

#!/bin/bash
 
str="db_backup_2003.zip"
re="200[0-3].zip"
 
echo "${str/$re/new}.bak" # db_backup_new.bak

Substring Removal Techniques

We often need to pre-process text segments by removing unwanted substrings in many text processing requirements. For example, if you extract a version number with the v prefix and some build numbers and want to find the major version number, you’ll have to remove some substrings. You can use the same substring replacement syntax but omit the replacement string parameter for string removals as follows:

#!/bin/bash  
 
str="ver5.02-2224.e2"
 
ver="${str#ver}"
echo $ver # 5.02-2224.e2
 
maj="${ver/.*}"
echo $maj # 5

In the above example, we used the exact substring and a wildcard for substring removal, but you can also use regular expressions. Check how to extract a clean version number without excessive characters:

#!/bin/bash
 
str="ver5.02-2224_release"
 
ver="${str//[a-z_]}"
echo $ver # 5.02-2224

Case Conversions and Case-Based Variables

Even the standard C language offers a function to convert the case of a character. Almost all modern programming languages provide inbuilt functions for case conversions. As a command language, Bash doesn’t offer functions for case conversions, but it gives us case conversion features via parameter expansion and variable declaration.

Look at the following example that converts letter cases:

#!/bin/bash  
  
str="Hello Bash!"  
  
lower="${str,,}"  
upper="${str^^}"  
  
echo $lower # hello bash!  
echo $upper # HELLO BASH!

You also can uppercase or lowercase only the first character of a particular string as follows:

#!/bin/bash
 
ver1="V2.0-release"
ver2="v4.0-release"
 
echo "${ver1,}" # v2.0-release
echo "${ver2^}" # V4.0-release

If you need to make a specific variable strictly uppercase or lowercase, you don’t need to run a case conversion function all the time. Instead, you can add case attributes to a particular variable with the inbuilt declare command, as shown in the following example:

#!/bin/bash  
 
declare -l ver1
declare -u ver2
 
ver1="V4.02.2"
ver2="v2.22.1"
 
echo $ver1 # v4.02.2
echo $ver2 #V2.22.1

The above ver1 and ver2 variables receive a case attribute during the declaration, so whenever you assign a value for a specific variable, Bash converts the text case based on variable attributes.

Splitting Strings (String-to-Array Conversion)

Bash lets you define indexed and associative arrays with the declare built-in. Most general-purpose programming languages offer a split method in the string object or via a standard library function (Go’s strings.Split function). You can split a string and create an array with several approaches in Bash. For example, we can change IFS to the required delimiter and use the read built-in. Or, we can use the tr command with a loop and construct the array. Or, using inbuilt parameter expansion is another approach. There are so many string-splitting approaches in Bash.

Using the IFS and read is one of the simplest and error-free ways to split a string:

#!/bin/bash  
  
str="C,C++,JavaScript,Python,Bash"
 
IFS=',' read -ra arr <<< "$str"
 
echo "${#arr[@]}" # 5
echo "${arr[0]}" # C
echo "${arr[4]}" # Bash

The above code snippet uses , as the split delimiter and uses theread inbuilt command to create an array based on IFS.

Even though there are simplest ways to handle splitting without read, make sure that there are no hidden issues. For example, the following split implementation is so simple, but it breaks when you include * (expands to current directory’s content) as an element and space as the delimiter:

#!/bin/bash
 
# WARNING: This code has several hidden issues.
 
str="C,Bash,*"
 
arr=(${str//,/ })
 
echo "${#arr[@]}" # contains current directory content