Thursday, May 08, 2008

Using expr

http://www.faqs.org/docs/abs/HTML/moreadv.html

Example 12-6. Using expr

   1 #!/bin/bash
2
3 # Demonstrating some of the uses of 'expr'
4 # =======================================
5
6 echo
7
8 # Arithmetic Operators
9 # ---------- ---------
10
11 echo "Arithmetic Operators"
12 echo
13 a=`expr 5 + 3`
14 echo "5 + 3 = $a"
15
16 a=`expr $a + 1`
17 echo
18 echo "a + 1 = $a"
19 echo "(incrementing a variable)"
20
21 a=`expr 5 % 3`
22 # modulo
23 echo
24 echo "5 mod 3 = $a"
25
26 echo
27 echo
28
29 # Logical Operators
30 # ------- ---------
31
32 # Returns 1 if true, 0 if false,
33 #+ opposite of normal Bash convention.
34
35 echo "Logical Operators"
36 echo
37
38 x=24
39 y=25
40 b=`expr $x = $y` # Test equality.
41 echo "b = $b" # 0 ( $x -ne $y )
42 echo
43
44 a=3
45 b=`expr $a \> 10`
46 echo 'b=`expr $a \> 10`, therefore...'
47 echo "If a > 10, b = 0 (false)"
48 echo "b = $b" # 0 ( 3 ! -gt 10 )
49 echo
50
51 b=`expr $a \< 10`
52 echo "If a < 10, b = 1 (true)"
53 echo "b = $b" # 1 ( 3 -lt 10 )
54 echo
55 # Note escaping of operators.
56
57 b=`expr $a \<= 3`
58 echo "If a <= 3, b = 1 (true)"
59 echo "b = $b" # 1 ( 3 -le 3 )
60 # There is also a "\>=" operator (greater than or equal to).
61
62
63 echo
64 echo
65
66 # Comparison Operators
67 # ---------- ---------
68
69 echo "Comparison Operators"
70 echo
71 a=zipper
72 echo "a is $a"
73 if [ `expr $a = snap` ]
74 # Force re-evaluation of variable 'a'
75 then
76 echo "a is not zipper"
77 fi
78
79 echo
80 echo
81
82
83
84 # String Operators
85 # ------ ---------
86
87 echo "String Operators"
88 echo
89
90 a=1234zipper43231
91 echo "The string being operated upon is \"$a\"."
92
93 # length: length of string
94 b=`expr length $a`
95 echo "Length of \"$a\" is $b."
96
97 # index: position of first character in substring
98 # that matches a character in string
99 b=`expr index $a 23`
100 echo "Numerical position of first \"2\" in \"$a\" is \"$b\"."
101
102 # substr: extract substring, starting position & length specified
103 b=`expr substr $a 2 6`
104 echo "Substring of \"$a\", starting at position 2,\
105 and 6 chars long is \"$b\"."
106
107
108 # The default behavior of the 'match' operations is to
109 #+ search for the specified match at the ***beginning*** of the string.
110 #
111 # uses Regular Expressions
112 b=`expr match "$a" '[0-9]*'` # Numerical count.
113 echo Number of digits at the beginning of \"$a\" is $b.
114 b=`expr match "$a" '\([0-9]*\)'` # Note that escaped parentheses
115 # == == + trigger substring match.
116 echo "The digits at the beginning of \"$a\" are \"$b\"."
117
118 echo
119
120 exit 0

Important

The : operator can substitute for match. For example, b=`expr $a : [0-9]*` is the exact equivalent of b=`expr match $a [0-9]*` in the above listing.

   1 #!/bin/bash
2
3 echo
4 echo "String operations using \"expr \$string : \" construct"
5 echo "==================================================="
6 echo
7
8 a=1234zipper5FLIPPER43231
9
10 echo "The string being operated upon is \"`expr "$a" : '\(.*\)'`\"."
11 # Escaped parentheses grouping operator. == ==
12
13 # ***************************
14 #+ Escaped parentheses
15 #+ match a substring
16 # ***************************
17
18
19 # If no escaped parentheses...
20 #+ then 'expr' converts the string operand to an integer.
21
22 echo "Length of \"$a\" is `expr "$a" : '.*'`." # Length of string
23
24 echo "Number of digits at the beginning of \"$a\" is `expr "$a" : '[0-9]*'`."
25
26 # ------------------------------------------------------------------------- #
27
28 echo
29
30 echo "The digits at the beginning of \"$a\" are `expr "$a" : '\([0-9]*\)'`."
31 # == ==
32 echo "The first 7 characters of \"$a\" are `expr "$a" : '\(.......\)'`."
33 # ===== == ==
34 # Again, escaped parentheses force a substring match.
35 #
36 echo "The last 7 characters of \"$a\" are `expr "$a" : '.*\(.......\)'`."
37 # ==== end of string operator ^^
38 # (actually means skip over one or more of any characters until specified
39 #+ substring)
40
41 echo
42
43 exit 0

regular expression

An expression is a string of characters. Those characters that have an interpretation above and beyond their literal meaning are called metacharacters. A quote symbol, for example, may denote speech by a person, ditto, or a meta-meaning for the symbols that follow. Regular Expressions are sets of characters and/or metacharacters that UNIX endows with special features. [1]

The main uses for Regular Expressions (REs) are text searches and string manipulation. An RE matches a single character or a set of characters (a substring or an entire string).

  • The asterisk -- * -- matches any number of repeats of the character string or RE preceding it, including zero.

    "1133*" matches 11 + one or more 3's + possibly other characters: 113, 1133, 111312, and so forth.

  • The dot -- . -- matches any one character, except a newline. [2]

    "13." matches 13 + at least one of any character (including a space): 1133, 11333, but not 13 (additional character missing).

  • The caret -- ^ -- matches the beginning of a line, but sometimes, depending on context, negates the meaning of a set of characters in an RE.

  • The dollar sign -- $ -- at the end of an RE matches the end of a line.

    "^$" matches blank lines.

    Note

    The ^ and $ are known as anchors, since they indicate or anchor a position within an RE.

  • Brackets -- [...] -- enclose a set of characters to match in a single RE.

    "[xyz]" matches the characters x, y, or z.

    "[c-n]" matches any of the characters in the range c to n.

    "[B-Pk-y]" matches any of the characters in the ranges B to P and k to y.

    "[a-z0-9]" matches any lowercase letter or any digit.

    "[^b-d]" matches all characters except those in the range b to d. This is an instance of ^ negating or inverting the meaning of the following RE (taking on a role similar to ! in a different context).

    Combined sequences of bracketed characters match common word patterns. "[Yy][Ee][Ss]" matches yes, Yes, YES, yEs, and so forth. "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]" matches any Social Security number.

  • The backslash -- \ -- escapes a special character, which means that character gets interpreted literally.

    A "\$" reverts back to its literal meaning of "$", rather than its RE meaning of end-of-line. Likewise a "\\" has the literal meaning of "\".

  • Escaped "angle brackets" -- \<...\> -- mark word boundaries.

    The angle brackets must be escaped, since otherwise they have only their literal character meaning.

    "\" matches the word "the", but not the words "them", "there", "other", etc.

     bash$ cat textfile
    This is line 1, of which there is only one instance.
    This is the only instance of line 2.
    This is line 3, another line.
    This is line 4.



    bash$ grep 'the' textfile
    This is line 1, of which there is only one instance.
    This is the only instance of line 2.
    This is line 3, another line.



    bash$ grep '\' textfile
    This is the only instance of line 2.

  • Extended REs. Used in egrep, awk, and Perl

  • The question mark -- ? -- matches zero or one of the previous RE. It is generally used for matching single characters.

  • The plus -- + -- matches one or more of the previous RE. It serves a role similar to the *, but does not match zero occurrences.

       1 # GNU versions of sed and awk can use "+",
    2 # but it needs to be escaped.
    3
    4 echo a111b | sed -ne '/a1\+b/p'
    5 echo a111b | grep 'a1\+b'
    6 echo a111b | gawk '/a1+b/'
    7 # All of above are equivalent.
    8
    9 # Thanks, S.C.

  • Escaped "curly brackets" -- \{ \} -- indicate the number of occurrences of a preceding RE to match.

    It is necessary to escape the curly brackets since they have only their literal character meaning otherwise. This usage is technically not part of the basic RE set.

    "[0-9]\{5\}" matches exactly five digits (characters in the range of 0 to 9).

    Note

    Curly brackets are not available as an RE in the "classic" (non-POSIX compliant) version of awk. However, gawk has the --re-interval option that permits them (without being escaped).

     bash$ echo 2222 | gawk --re-interval '/2{3}/'
    2222

    Perl and some egrep versions do not require escaping the curly brackets.

  • Parentheses -- ( ) -- enclose groups of REs. They are useful with the following "|" operator and in substring extraction using expr.

  • The -- | -- "or" RE operator matches any of a set of alternate characters.

     bash$ egrep 're(a|e)d' misc.txt
    People who read seem to be better informed than those who do not.
    The clarinet produces sound by the vibration of its reed.

Blog Archive