Lots of tips on String manipulation using bash. I may only be using one or two of them, especially converting a file extension, e.g. ${filename/.json/.yaml}. It’s quite powerful!
Bash became every Unix-like or Unix-based operating system’s default automation language. Every system administrator, DevOps engineer, and programmer typically uses Bash to write shell scripts with repetitive command sequences. Bash scripts typically contain commands that run other program binaries. In most scenarios, we may have to process data and create a logical flow within the shell script. So, we often have to add conditional and text manipulation statements in our shell scripts.
Traditional Bash scripts and past programmers who used older Bash interpreter versions typically used awk, sed, tr, and cut commands for text manipulation. These are separate programs. Even though these text processing programs offer good features, they slow down your Bash script since each particular command has a considerable process spawn-up time. Modern Bash versions offer inbuilt text processing features via the well-known parameter expansion feature.
In this story, I’ll explain some inbuilt string manipulation syntaxes you can use to process text productively in your Bash scripts.
Substring Extraction and Replacement
A substring refers to a contagious segment or a part of a particular string. In various scripting scenarios, we need to extract substrings from string segments. For example, you may need to get only the filename segment from a complete filename that consists of an extension. Also, you may need to replace substrings with specific string segments (i.e., changing the file extension of a filename).
Substring extraction is so easy by providing character position and length:
When you work with some strings, such as filenames, paths, etc., you may have to replace string prefixes and suffixes. Replacing a file extension with another extension is a good example. Look at the following example:
In the above substring replacement examples, we used the exact substring segment for matching, but you can also use a part of the substring by using the * wildcard character as follows:
The above approach is helpful if you don’t know the exact substring to search.
Regex Matches, Extractions, and Replacements
As many Unix or GNU/Linux users already know, it’s possible to use grep and sed for regular expressions-based text searching. sed helps us to do regex replacements. You can use inbuilt Bash regex features to handle text processing faster than these external binaries.
You can perform a regex match with an if-condition and the ”=~” operator, as shown in the following code snippet:
#!/bin/bashstr="db_backup_2003.zip"if [[ $str =~ 200[0-5]+ ]]; then echo "regex_matched"fi
You can also replace the if-statement with an inline conditional if you want:
[[ $str =~ 200[0-5]+ ]] && echo "regex_matched"
Once the Bash interpreter performs a regex match, it typically stores all matches in the BASH_REMATCH shell variable. This variable is a read-only array, and it stores the entire matched data in the first index. If you use sub-patterns, Bash incrementally keeps those matches in other indexes:
Remember we used wildcards with previous substring matching? Similarly, it’s possible to use regex definitions inside parameter expansions, as shown in the following example:
We often need to pre-process text segments by removing unwanted substrings in many text processing requirements. For example, if you extract a version number with the v prefix and some build numbers and want to find the major version number, you’ll have to remove some substrings. You can use the same substring replacement syntax but omit the replacement string parameter for string removals as follows:
In the above example, we used the exact substring and a wildcard for substring removal, but you can also use regular expressions. Check how to extract a clean version number without excessive characters:
Even the standard C language offers a function to convert the case of a character. Almost all modern programming languages provide inbuilt functions for case conversions. As a command language, Bash doesn’t offer functions for case conversions, but it gives us case conversion features via parameter expansion and variable declaration.
Look at the following example that converts letter cases:
If you need to make a specific variable strictly uppercase or lowercase, you don’t need to run a case conversion function all the time. Instead, you can add case attributes to a particular variable with the inbuilt declare command, as shown in the following example:
The above ver1 and ver2 variables receive a case attribute during the declaration, so whenever you assign a value for a specific variable, Bash converts the text case based on variable attributes.
Splitting Strings (String-to-Array Conversion)
Bash lets you define indexed and associative arrays with the declare built-in. Most general-purpose programming languages offer a split method in the string object or via a standard library function (Go’s strings.Split function). You can split a string and create an array with several approaches in Bash. For example, we can change IFS to the required delimiter and use the read built-in. Or, we can use the tr command with a loop and construct the array. Or, using inbuilt parameter expansion is another approach. There are so many string-splitting approaches in Bash.
Using the IFS and read is one of the simplest and error-free ways to split a string:
The above code snippet uses , as the split delimiter and uses theread inbuilt command to create an array based on IFS.
Even though there are simplest ways to handle splitting without read, make sure that there are no hidden issues. For example, the following split implementation is so simple, but it breaks when you include * (expands to current directory’s content) as an element and space as the delimiter:
#!/bin/bash# WARNING: This code has several hidden issues.str="C,Bash,*"arr=(${str//,/ })echo "${#arr[@]}" # contains current directory content