bash string manipulation
Abstract
Reviews
Lots of tips on String manipulation using bash. I may only be using one or two of them, especially converting a file extension, e.g.
${filename/.json/.yaml}
. It’s quite powerful!
Bash became every Unix-like or Unix-based operating system’s default automation language. Every system administrator, DevOps engineer, and programmer typically uses Bash to write shell scripts with repetitive command sequences. Bash scripts typically contain commands that run other program binaries. In most scenarios, we may have to process data and create a logical flow within the shell script. So, we often have to add conditional and text manipulation statements in our shell scripts.
Traditional Bash scripts and past programmers who used older Bash interpreter versions typically used awk
, sed
, tr
, and cut
commands for text manipulation. These are separate programs. Even though these text processing programs offer good features, they slow down your Bash script since each particular command has a considerable process spawn-up time. Modern Bash versions offer inbuilt text processing features via the well-known parameter expansion feature.
In this story, I’ll explain some inbuilt string manipulation syntaxes you can use to process text productively in your Bash scripts.
Substring Extraction and Replacement
A substring refers to a contagious segment or a part of a particular string. In various scripting scenarios, we need to extract substrings from string segments. For example, you may need to get only the filename segment from a complete filename that consists of an extension. Also, you may need to replace substrings with specific string segments (i.e., changing the file extension of a filename).
Substring extraction is so easy by providing character position and length:
You can even do substring calculations from the right side, as follows:
Bash also offers a productive inbuilt syntax for substring replacements:
When you work with some strings, such as filenames, paths, etc., you may have to replace string prefixes and suffixes. Replacing a file extension with another extension is a good example. Look at the following example:
In the above substring replacement examples, we used the exact substring segment for matching, but you can also use a part of the substring by using the *
wildcard character as follows:
The above approach is helpful if you don’t know the exact substring to search.
Regex Matches, Extractions, and Replacements
As many Unix or GNU/Linux users already know, it’s possible to use grep
and sed
for regular expressions-based text searching. sed
helps us to do regex replacements. You can use inbuilt Bash regex features to handle text processing faster than these external binaries.
You can perform a regex match with an if-condition and the ”=~” operator, as shown in the following code snippet:
You can also replace the if-statement with an inline conditional if you want:
Once the Bash interpreter performs a regex match, it typically stores all matches in the BASH_REMATCH
shell variable. This variable is a read-only array, and it stores the entire matched data in the first index. If you use sub-patterns, Bash incrementally keeps those matches in other indexes:
Remember we used wildcards with previous substring matching? Similarly, it’s possible to use regex definitions inside parameter expansions, as shown in the following example:
Substring Removal Techniques
We often need to pre-process text segments by removing unwanted substrings in many text processing requirements. For example, if you extract a version number with the v
prefix and some build numbers and want to find the major version number, you’ll have to remove some substrings. You can use the same substring replacement syntax but omit the replacement string parameter for string removals as follows:
In the above example, we used the exact substring and a wildcard for substring removal, but you can also use regular expressions. Check how to extract a clean version number without excessive characters:
Case Conversions and Case-Based Variables
Even the standard C language offers a function to convert the case of a character. Almost all modern programming languages provide inbuilt functions for case conversions. As a command language, Bash doesn’t offer functions for case conversions, but it gives us case conversion features via parameter expansion and variable declaration.
Look at the following example that converts letter cases:
You also can uppercase or lowercase only the first character of a particular string as follows:
If you need to make a specific variable strictly uppercase or lowercase, you don’t need to run a case conversion function all the time. Instead, you can add case attributes to a particular variable with the inbuilt declare
command, as shown in the following example:
The above ver1
and ver2
variables receive a case attribute during the declaration, so whenever you assign a value for a specific variable, Bash converts the text case based on variable attributes.
Splitting Strings (String-to-Array Conversion)
Bash lets you define indexed and associative arrays with the declare
built-in. Most general-purpose programming languages offer a split
method in the string object or via a standard library function (Go’s strings.Split
function). You can split a string and create an array with several approaches in Bash. For example, we can change IFS
to the required delimiter and use the read
built-in. Or, we can use the tr
command with a loop and construct the array. Or, using inbuilt parameter expansion is another approach. There are so many string-splitting approaches in Bash.
Using the IFS
and read
is one of the simplest and error-free ways to split a string:
The above code snippet uses ,
as the split delimiter and uses theread
inbuilt command to create an array based on IFS
.
Even though there are simplest ways to handle splitting without read
, make sure that there are no hidden issues. For example, the following split implementation is so simple, but it breaks when you include *
(expands to current directory’s content) as an element and space as the delimiter: