How to Read the Data From Text File in Bash Scripting

sort

File sort utility, ofttimes used as a filter in a piping. This command sorts a text stream or file frontwards or backwards, or co-ordinate to various keys or grapheme positions. Using the -m choice, it merges presorted input files. The info page lists its many capabilities and options. See Example eleven-10, Example 11-11, and Example A-8.

tsort

Topological sort, reading in pairs of whitespace-separated strings and sorting according to input patterns. The original purpose of tsort was to sort a list of dependencies for an obsolete version of the ld linker in an "ancient" version of UNIX.

The results of a tsort will ordinarily differ markedly from those of the standard sort command, to a higher place.

uniq

This filter removes duplicate lines from a sorted file. It is oft seen in a piping coupled with sort.

true cat list-one list-2 list-three | sort | uniq > concluding.list # Concatenates the list files, # sorts them, # removes indistinguishable lines, # and finally writes the upshot to an output file.

The useful -c option prefixes each line of the input file with its number of occurrences.

                  bash$                                                        cat testfile                                    This line occurs only once.  This line occurs twice.  This line occurs twice.  This line occurs three times.  This line occurs 3 times.  This line occurs three times.                  bash$                                                        uniq -c testfile                                                        one This line occurs only once.        2 This line occurs twice.        3 This line occurs three times.                  fustigate$                                                        sort testfile | uniq -c | sort -nr                                                        iii This line occurs three times.        2 This line occurs twice.        1 This line occurs only once.                

The sort INPUTFILE | uniq -c | sort -nr control cord produces a frequency of occurrence list on the INPUTFILE file (the -nr options to sort cause a opposite numerical sort). This template finds use in assay of log files and dictionary lists, and wherever the lexical construction of a document needs to exist examined.

Example xvi-12. Word Frequency Analysis

#!/bin/bash # wf.sh: Rough discussion frequency analysis on a text file. # This is a more efficient version of the "wf2.sh" script.   # Cheque for input file on command-line. ARGS=1 E_BADARGS=85 E_NOFILE=86  if [ $# -ne "$ARGS" ]  # Correct number of arguments passed to script? so   echo "Usage: `basename $0` filename"   exit $E_BADARGS fi  if [ ! -f "$one" ]       # Cheque if file exists. then   repeat "File \"$1\" does not be."   exit $E_NOFILE fi    ######################################################## # main () sed -eastward 's/\.//g'  -e 's/\,//thousand' -eastward 's/ /\ /g' "$i" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr #                           ========================= #                            Frequency of occurrence  #  Filter out periods and commas, and #+ modify space betwixt words to linefeed, #+ then shift characters to lowercase, and #+ finally prefix occurrence count and sort numerically.  #  Arun Giridhar suggests modifying the above to: #  . . . | sort | uniq -c | sort +ane [-f] | sort +0 -nr #  This adds a secondary sort cardinal, so instances of #+ equal occurrence are sorted alphabetically. #  As he explains information technology: #  "This is effectively a radix sort, first on the #+ least meaning cavalcade #+ (word or string, optionally example-insensitive) #+ and last on the most significant column (frequency)." # #  Equally Frank Wang explains, the above is equivalent to #+       . . . | sort | uniq -c | sort +0 -nr #+ and the following also works: #+       . . . | sort | uniq -c | sort -k1nr -grand ########################################################  exit 0  # Exercises: # --------- # 1) Add 'sed' commands to filter out other punctuation, #+   such equally semicolons. # ii) Alter the script to also filter out multiple spaces and #+   other whitespace.

                  fustigate$                                                        true cat testfile                                    This line occurs only one time.  This line occurs twice.  This line occurs twice.  This line occurs three times.  This line occurs three times.  This line occurs three times.                  fustigate$                                                        ./wf.sh testfile                                                        6 this        half-dozen occurs        six line        three times        iii three        2 twice        1 only        1 once                

expand, unexpand

The expand filter converts tabs to spaces. Information technology is often used in a pipe.

The unexpand filter converts spaces to tabs. This reverses the consequence of expand.

cut

A tool for extracting fields from files. It is similar to the print $North command set in awk, but more limited. It may be simpler to use cut in a script than awk. Especially important are the -d (delimiter) and -f (field specifier) options.

Using cut to obtain a listing of the mounted filesystems:

cut -d ' ' -f1,2 /etc/mtab

Using cut to list the Os and kernel version:

uname -a | cutting -d" " -f1,three,11,12

Using cutting to excerpt message headers from an e-mail folder:

                  bash$                                                        grep '^Discipline:' read-messages | cutting -c10-80                                    Re: Linux suitable for mission-disquisitional apps?  MAKE MILLIONS WORKING AT Dwelling!!!  Spam complaint  Re: Spam complaint                

Using cut to parse a file:

# List all the users in /etc/passwd.  FILENAME=/etc/passwd  for user in $(cut -d: -f1 $FILENAME) exercise   echo $user washed  # Thanks, Oleg Philon for suggesting this.

cutting -d ' ' -f2,3 filename is equivalent to awk -F'[ ]' '{ print $2, $3 }' filename

Note

It is fifty-fifty possible to specify a linefeed as a delimiter. The fob is to really embed a linefeed (RETURN) in the command sequence.

                            bash$                                                                                      cutting -d'  ' -f3,7,19 testfile                                                        This is line 3 of testfile.  This is line vii of testfile.  This is line 19 of testfile.                          

Give thanks you, Jaka Kranjc, for pointing this out.

Encounter also Example 16-48.

paste

Tool for merging together different files into a single, multi-cavalcade file. In combination with cutting, useful for creating organization log files.

                  fustigate$                                                        cat items                                    alphabet blocks  edifice blocks  cables                  bash$                                                        true cat prices                                    $1.00/dozen  $2.50 ea.  $iii.75                  bash$                                                        paste items prices                                    alphabet blocks $1.00/dozen  building blocks $2.50 ea.  cables  $3.75                

join

Consider this a special-purpose cousin of paste. This powerful utility allows merging two files in a meaningful way, which essentially creates a elementary version of a relational database.

The bring together command operates on exactly two files, but pastes together only those lines with a common tagged field (usually a numerical label), and writes the result to stdout. The files to be joined should exist sorted co-ordinate to the tagged field for the matchups to work properly.

File: 1.data  100 Shoes 200 Laces 300 Socks

File: ii.information  100 $40.00 200 $1.00 300 $2.00

                  bash$                                                        join i.data 2.data                                    File: i.data two.data   100 Shoes $40.00  200 Laces $ane.00  300 Socks $ii.00                

Note

The tagged field appears simply once in the output.

head

lists the first of a file to stdout. The default is 10 lines, but a different number tin can be specified. The command has a number of interesting options.

Instance xvi-xiii. Which files are scripts?

#!/bin/bash # script-detector.sh: Detects scripts inside a directory.  TESTCHARS=2    # Test first 2 characters. SHABANG='#!'   # Scripts begin with a "sha-bang."  for file in *  # Traverse all the files in current directory. practice   if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]   #      caput -c2                      #!   #  The '-c' option to "head" outputs a specified   #+ number of characters, rather than lines (the default).   then     repeat "File \"$file\" is a script."   else     echo "File \"$file\" is *not* a script."   fi done    leave 0  #  Exercises: #  --------- #  ane) Modify this script to accept every bit an optional argument #+    the directory to scan for scripts #+    (rather than only the current working directory). # #  two) Equally it stands, this script gives "false positives" for #+    Perl, awk, and other scripting language scripts. #     Correct this.

Example 16-14. Generating 10-digit random numbers

#!/bin/bash # rnd.sh: Outputs a 10-digit random number  # Script by Stephane Chazelas.  head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'   # =================================================================== #  # Assay # --------  # head: # -c4 option takes first 4 bytes.  # od: # -N4 option limits output to four bytes. # -tu4 option selects unsigned decimal format for output.  # sed:  # -n option, in combination with "p" flag to the "s" command, # outputs only matched lines.    # The author of this script explains the activity of 'sed', equally follows.  # head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' # ----------------------------------> |  # Assume output up to "sed" --------> | # is 0000000 1198195154\north  #  sed begins reading characters: 0000000 1198195154\northward. #  Here it finds a newline character, #+ so information technology is set up to process the first line (0000000 1198195154). #  It looks at its <range><activeness>s. The first and merely i is  #   range     action #   1         southward/.* //p  #  The line number is in the range, so it executes the action: #+ tries to substitute the longest cord ending with a space in the line #  ("0000000 ") with nothing (//), and if information technology succeeds, prints the effect #  ("p" is a flag to the "s" command here, this is dissimilar #+ from the "p" command).  #  sed is now gear up to continue reading its input. (Notation that before #+ standing, if -due north option had non been passed, sed would have printed #+ the line over again).  #  Now, sed reads the residuum of the characters, and finds the #+ stop of the file. #  It is now ready to process its second line (which is too numbered '$' as #+ it's the last one). #  Information technology sees it is not matched by any <range>, and so its chore is washed.  #  In few word this sed commmand means: #  "On the get-go line merely, remove any character up to the correct-most space, #+ then impress it."  # A better way to do this would have been: #           sed -e 'due south/.* //;q'  # Hither, two <range><activity>s (could accept been written #           sed -e 's/.* //' -due east q):  #   range                    activeness #   cypher (matches line)   s/.* // #   nothing (matches line)   q (quit)  #  Here, sed but reads its commencement line of input. #  It performs both actions, and prints the line (substituted) before #+ quitting (because of the "q" action) since the "-n" pick is not passed.  # =================================================================== #  # An even simpler altenative to the above i-line script would be: #           head -c4 /dev/urandom| od -An -tu4  exit

Come across likewise Case 16-39.
tail

lists the (tail) finish of a file to stdout. The default is ten lines, but this can be changed with the -due north option. Commonly used to go along rails of changes to a system logfile, using the -f option, which outputs lines appended to the file.

Instance xvi-15. Using tail to monitor the system log

#!/bin/bash  filename=sys.log  cat /dev/cipher > $filename; echo "Creating / cleaning out file." #  Creates the file if it does not already exist, #+ and truncates it to zip length if it does. #  : > filename   and   > filename also piece of work.  tail /var/log/letters > $filename   # /var/log/messages must take world read permission for this to work.  echo "$filename contains tail end of system log."  exit 0

Tip

To list a specific line of a text file, pipage the output of caput to tail -n 1. For instance caput -n eight database.txt | tail -n one lists the 8th line of the file database.txt.

To set a variable to a given block of a text file:

var=$(caput -north $m $filename | tail -n $n)  # filename = name of file # m = from first of file, number of lines to end of block # n = number of lines to set variable to (trim from finish of block)

Note

Newer implementations of tail deprecate the older tail -$LINES filename usage. The standard tail -northward $LINES filename is right.

Run into as well Example sixteen-5, Example 16-39 and Case 32-half-dozen.

grep

A multi-purpose file search tool that uses Regular Expressions. It was originally a control/filter in the venerable ed line editor: k/re/p -- global - regular expression - print.

grep blueprint [ file ...]

Search the target file(s) for occurrences of pattern , where blueprint may be literal text or a Regular Expression.

                  bash$                                                        grep '[rst]ystem.$' osinfo.txt                                    The GPL governs the distribution of the Linux operating system.                

If no target file(s) specified, grep works as a filter on stdout, as in a piping.

                  fustigate$                                                        ps ax | grep clock                                    765 tty1     S      0:00 xclock  901 pts/1    Due south      0:00 grep clock                

The -i pick causes a case-insensitive search.

The -w option matches simply whole words.

The -l option lists merely the files in which matches were found, but not the matching lines.

The -r (recursive) option searches files in the current working directory and all subdirectories below it.

The -n option lists the matching lines, together with line numbers.

                  bash$                                                        grep -north Linux osinfo.txt                                    2:This is a file containing information about Linux.  half dozen:The GPL governs the distribution of the Linux operating system.                

The -v (or --invert-match) option filters out matches.

grep pattern1 *.txt | grep -v pattern2  # Matches all lines in "*.txt" files containing "pattern1", # only ***not*** "pattern2".                

The -c (--count) choice gives a numerical count of matches, rather than actually listing the matches.

grep -c txt *.sgml   # (number of occurrences of "txt" in "*.sgml" files)   #   grep -cz . #            ^ dot # means count (-c) naught-separated (-z) items matching "." # that is, not-empty ones (containing at least ane character). #  printf 'a b\nc  d\north\northward\northward\n\northward\000\due north\000e\000\000\nf' | grep -cz .     # 3 printf 'a b\nc  d\n\n\n\north\n\000\n\000e\000\000\nf' | grep -cz '$'   # 5 printf 'a b\nc  d\n\n\n\n\north\000\n\000e\000\000\nf' | grep -cz '^'   # 5 # printf 'a b\nc  d\n\n\n\due north\due north\000\northward\000e\000\000\nf' | grep -c '$'    # nine # By default, newline chars (\n) separate items to friction match.   # Notation that the -z option is GNU "grep" specific.   # Thanks, S.C.

The --color (or --color) option marks the matching cord in color (on the panel or in an xterm window). Since grep prints out each entire line containing the matching pattern, this lets you run into exactly what is being matched. See too the -o option, which shows but the matching portion of the line(s).

Example 16-16. Press out the From lines in stored email messages

#!/bin/fustigate # from.sh  #  Emulates the useful 'from' utility in Solaris, BSD, etc. #  Echoes the "From" header line in all messages #+ in your e-mail directory.   MAILDIR=~/post/*               #  No quoting of variable. Why? # Mayhap bank check if-exists $MAILDIR:   if [ -d $MAILDIR ] . . . GREP_OPTS="-H -A five --color"    #  Show file, plus extra context lines                                #+ and display "From" in colour. TARGETSTR="^From"              # "From" at first of line.  for file in $MAILDIR           #  No quoting of variable. practice   grep $GREP_OPTS "$TARGETSTR" "$file"   #    ^^^^^^^^^^              #  Again, do not quote this variable.   repeat washed  exit $?  #  You might wish to pipage the output of this script to 'more' #+ or redirect it to a file . . .

When invoked with more than 1 target file given, grep specifies which file contains matches.

                  fustigate$                                                        grep Linux osinfo.txt misc.txt                                    osinfo.txt:This is a file containing data about Linux.  osinfo.txt:The GPL governs the distribution of the Linux operating organization.  misc.txt:The Linux operating system is steadily gaining in popularity.                

Tip

To force grep to show the filename when searching just one target file, simply give /dev/null equally the second file.

                            bash$                                                                                      grep Linux osinfo.txt /dev/zippo                                                        osinfo.txt:This is a file containing data about Linux.  osinfo.txt:The GPL governs the distribution of the Linux operating organization.                          

If there is a successful match, grep returns an exit status of 0, which makes it useful in a condition exam in a script, especially in combination with the -q option to suppress output.

SUCCESS=0                      # if grep lookup succeeds word=Linux filename=data.file  grep -q "$give-and-take" "$filename"    #  The "-q" pick                                #+ causes nothing to echo to stdout. if [ $? -eq $SUCCESS ] # if grep -q "$discussion" "$filename"   can supplant lines 5 - seven. and so   echo "$word found in $filename" else   echo "$word not plant in $filename" fi

Example 32-half-dozen demonstrates how to use grep to search for a word pattern in a system logfile.

Example 16-17. Emulating grep in a script

#!/bin/fustigate # grp.sh: Rudimentary reimplementation of grep.  E_BADARGS=85  if [ -z "$1" ]    # Cheque for argument to script. and so   repeat "Usage: `basename $0` pattern"   exit $E_BADARGS fi    echo  for file in *     # Traverse all files in $PWD. do   output=$(sed -n /"$1"/p $file)  # Command substitution.    if [ ! -z "$output" ]           # What happens if "$output" is not quoted?   then     echo -n "$file: "     echo "$output"   fi              #  sed -ne "/$one/s|^|${file}: |p"  is equivalent to above.    echo washed    echo  leave 0  # Exercises: # --------- # one) Add newlines to output, if more than one match in any given file. # 2) Add features.

How can grep search for two (or more) carve up patterns? What if you desire grep to display all lines in a file or files that contain both "pattern1" and "pattern2"?

1 method is to piping the issue of grep pattern1 to grep pattern2.

For example, given the following file:

# Filename: tstfile  This is a sample file. This is an ordinary text file. This file does not comprise any unusual text. This file is not unusual. Here is some text.

Now, let's search this file for lines containing both "file" and "text" . . .

                bash$                                                  grep file tstfile                                # Filename: tstfile  This is a sample file.  This is an ordinary text file.  This file does not comprise any unusual text.  This file is not unusual.                fustigate$                                                  grep file tstfile | grep text                                This is an ordinary text file.  This file does not comprise any unusual text.              

Now, for an interesting recreational use of grep . . .

Example 16-xviii. Crossword puzzle solver

#!/bin/bash # cw-solver.sh # This is actually a wrapper effectually a ane-liner (line 46).  #  Crossword puzzle and anagramming word game solver. #  Y'all know *some* of the letters in the give-and-take you lot're looking for, #+ then you need a listing of all valid words #+ with the known letters in given positions. #  For example: w...i....n #               1???five????10 # w in position 1, three unknowns, i in the fifth, 4 unknowns, n at the finish. # (See comments at finish of script.)   E_NOPATT=71 DICT=/usr/share/dict/give-and-take.lst #                    ^^^^^^^^   Looks for discussion list here. #  ASCII discussion listing, i word per line. #  If you happen to need an appropriate list, #+ download the author's "yawl" word listing package. #  http://ibiblio.org/pub/Linux/libs/yawl-0.three.2.tar.gz #  or #  http://fustigate.deta.in/yawl-0.3.2.tar.gz   if [ -z "$1" ]   #  If no word pattern specified then             #+ as a command-line statement . . .   echo           #+ . . . and so . . .   echo "Usage:"  #+ Usage bulletin.   echo   echo ""$0" \"pattern,\""   repeat "where \"pattern\" is in the form"   repeat "thirty..x.10..."   echo   echo "The x's represent known letters,"   repeat "and the periods are unknown letters (blanks)."   echo "Letters and periods tin can be in whatsoever position."   echo "For case, attempt:   sh cw-solver.sh w...i....n"   echo   leave $E_NOPATT fi  echo # =============================================== # This is where all the work gets washed. grep ^"$i"$ "$DICT"   # Yes, only i line! #    |    | # ^ is kickoff-of-word regex anchor. # $ is end-of-word regex anchor.  #  From _Stupid Grep Tricks_, vol. 1, #+ a book the ABS Guide author may nonetheless go around #+ to writing . . . one of these days . . . # =============================================== echo   get out $?  # Script terminates here. #  If there are likewise many words generated, #+ redirect the output to a file.  $ sh cw-solver.sh west...i....n  wellington workingman workingmen

egrep -- extended grep -- is the same equally grep -E. This uses a somewhat different, extended set up of Regular Expressions, which tin can brand the search a bit more flexible. Information technology likewise allows the boolean | (or) operator.

                  fustigate $                                                        egrep 'matches|Matches' file.txt                                    Line one matches.  Line 3 Matches.  Line 4 contains matches, but also Matches                

fgrep -- fast grep -- is the same as grep -F. It does a literal cord search (no Regular Expressions), which generally speeds things upward a bit.

Note

On some Linux distros, egrep and fgrep are symbolic links to, or aliases for grep, but invoked with the -E and -F options, respectively.

Example 16-19. Looking up definitions in Webster'south 1913 Dictionary

#!/bin/bash # dict-lookup.sh  #  This script looks up definitions in the 1913 Webster'south Lexicon. #  This Public Domain lexicon is available for download #+ from diverse sites, including #+ Projection Gutenberg (http://world wide web.gutenberg.org/etext/247). # #  Convert it from DOS to UNIX format (with only LF at end of line) #+ earlier using information technology with this script. #  Store the file in plain, uncompressed ASCII text. #  Set DEFAULT_DICTFILE variable below to path/filename.   E_BADARGS=85 MAXCONTEXTLINES=50                        # Maximum number of lines to show. DEFAULT_DICTFILE="/usr/share/dict/webster1913-dict.txt"                                           # Default dictionary file pathname.                                           # Change this every bit necessary. #  Note: #  ---- #  This particular edition of the 1913 Webster'due south #+ begins each entry with an majuscule letter #+ (lowercase for the remaining characters). #  Only the *very first line* of an entry begins this way, #+ and that's why the search algorithm below works.    if [[ -z $(echo "$1" | sed -n '/^[A-Z]/p') ]] #  Must at least specify word to wait up, and #+ it must start with an capital letter. then   echo "Usage: `basename $0` Word-to-define [dictionary-file]"   echo   echo "Annotation: Word to look upward must start with uppercase letter,"   repeat "with the rest of the word in lowercase."   echo "--------------------------------------------"   echo "Examples: Carelessness, Dictionary, Marking, etc."   exit $E_BADARGS fi   if [ -z "$2" ]                            #  May specify unlike dictionary                                           #+ as an statement to this script. then   dictfile=$DEFAULT_DICTFILE else   dictfile="$2" fi  # --------------------------------------------------------- Definition=$(fgrep -A $MAXCONTEXTLINES "$ane \\" "$dictfile") #                  Definitions in form "Word \..." # #  And, yep, "fgrep" is fast enough #+ to search even a very large text file.   # At present, snip out just the definition block.  repeat "$Definition" | sed -n 'i,/^[A-Z]/p' | #  Print from first line of output #+ to the showtime line of the side by side entry. sed '$d' | sed '$d' #  Delete last ii lines of output #+ (blank line and first line of next entry). # ---------------------------------------------------------  exit $?  # Exercises: # --------- # 1)  Alter the script to accept any type of alphabetic input #   + (uppercase, lowercase, mixed case), and convert it #   + to an acceptable format for processing. # # ii)  Convert the script to a GUI application, #   + using something like 'gdialog' or 'zenity' . . . #     The script will so no longer have its argument(s) #   + from the command-line. # # 3)  Modify the script to parse ane of the other available #   + Public Domain Dictionaries, such as the U.S. Census Bureau Gazetteer.

Note

See also Instance A-41 for an example of speedy fgrep lookup on a large text file.

agrep (approximate grep) extends the capabilities of grep to approximate matching. The search string may differ by a specified number of characters from the resulting matches. This utility is not part of the cadre Linux distribution.

Tip

To search compressed files, use zgrep, zegrep, or zfgrep. These also work on non-compressed files, though slower than plain grep, egrep, fgrep. They are handy for searching through a mixed set of files, some compressed, some non.

To search bzipped files, use bzgrep.

look

The command look works similar grep, simply does a lookup on a "dictionary," a sorted give-and-take listing. By default, expect searches for a lucifer in /usr/dict/words, but a unlike dictionary file may exist specified.

Example xvi-20. Checking words in a list for validity

#!/bin/bash # lookup: Does a dictionary lookup on each word in a data file.  file=words.information  # Information file from which to read words to examination.  echo repeat "Testing file $file" echo  while [ "$word" != end ]  # Last give-and-take in data file. exercise               # ^^^   read word      # From information file, because of redirection at stop of loop.   look $word > /dev/cypher  # Don't desire to brandish lines in dictionary file.   #  Searches for words in the file /usr/share/dict/words   #+ (ordinarily a link to linux.words).   lookup=$?      # Exit status of 'look' control.    if [ "$lookup" -eq 0 ]   and then     echo "\"$word\" is valid."   else     repeat "\"$word\" is invalid."   fi    done <"$file"    # Redirects stdin to $file, then "reads" come from there.  repeat  leave 0  # ---------------------------------------------------------------- # Code below line will non execute considering of "exit" command in a higher place.   # Stephane Chazelas proposes the following, more concise alternative:  while read word && [[ $give-and-take != cease ]] practise if wait "$discussion" > /dev/zilch    then echo "\"$discussion\" is valid."    else echo "\"$discussion\" is invalid."    fi done <"$file"  exit 0

sed, awk

Scripting languages especially suited for parsing text files and control output. May be embedded singly or in combination in pipes and shell scripts.

sed

Not-interactive "stream editor", permits using many ex commands in batch mode. It finds many uses in vanquish scripts.

awk

Programmable file extractor and formatter, good for manipulating and/or extracting fields (columns) in structured text files. Its syntax is similar to C.

wc

wc gives a "word count" on a file or I/O stream:

                  bash $                                                        wc /usr/share/dr./sed-4.1.2/README                                    13  70  447 README                  [13 lines  lxx words  447 characters]

wc -due west gives only the give-and-take count.

wc -fifty gives simply the line count.

wc -c gives only the byte count.

wc -m gives only the character count.

wc -L gives simply the length of the longest line.

Using wc to count how many .txt files are in current working directory:

$ ls *.txt | wc -l #  Will work equally long equally none of the "*.txt" files #+ take a linefeed embedded in their proper noun.  #  Culling ways of doing this are: #      detect . -maxdepth i -name \*.txt -print0 | grep -cz . #      (shopt -s nullglob; set -- *.txt; echo $#)  #  Thanks, S.C.

Using wc to full up the size of all the files whose names begin with messages in the range d - h

                  fustigate$                                                        wc [d-h]* | grep total | awk '{print $3}'                                    71832                

Using wc to count the instances of the word "Linux" in the principal source file for this book.

                  bash$                                                        grep Linux abs-book.sgml | wc -50                                    138                

Meet as well Example xvi-39 and Example 20-8.

Certain commands include some of the functionality of wc as options.

... | grep foo | wc -fifty # This oft used construct tin can exist more than concisely rendered.  ... | grep -c foo # Just use the "-c" (or "--count") option of grep.  # Thanks, South.C.

tr

character translation filter.

Caution

Must employ quoting and/or brackets, as appropriate. Quotes prevent the shell from reinterpreting the special characters in tr control sequences. Brackets should be quoted to prevent expansion past the shell.

Either tr "A-Z" "*" <filename or tr A-Z \* <filename changes all the uppercase letters in filename to asterisks (writes to stdout). On some systems this may non work, but tr A-Z '[**]' will.

The -d choice deletes a range of characters.

echo "abcdef"                 # abcdef echo "abcdef" | tr -d b-d     # aef   tr -d 0-9 <filename # Deletes all digits from the file "filename".

The --clasp-repeats (or -due south) option deletes all but the first instance of a string of consecutive characters. This option is useful for removing excess whitespace.

                  bash$                                                        repeat "XXXXX" | tr --clasp-repeats 'X'                                    10                

The -c "complement" pick inverts the grapheme ready to friction match. With this option, tr acts but upon those characters non matching the specified set.

                  bash$                                                        echo "acfdeb123" | tr -c b-d +                                    +c+d+b++++                

Note that tr recognizes POSIX character classes. [one]

                  bash$                                                        echo "abcd2ef1" | tr '[:alpha:]' -                                    ----2--1                

Instance 16-21. toupper: Transforms a file to all uppercase.

#!/bin/bash # Changes a file to all upper-case letter.  E_BADARGS=85  if [ -z "$one" ]  # Standard check for command-line arg. then   echo "Usage: `basename $0` filename"   exit $E_BADARGS fi    tr a-z A-Z <"$one"  # Same result as above, merely using POSIX grapheme ready notation: #        tr '[:lower:]' '[:upper:]' <"$1" # Cheers, South.C.  #     Or fifty-fifty . . . #     cat "$i" | tr a-z A-Z #     Or dozens of other ways . . .  go out 0  #  Practise: #  Rewrite this script to give the pick of changing a file #+ to *either* upper or lowercase. #  Hint: Apply either the "case" or "select" command.

Instance 16-22. lowercase: Changes all filenames in working directory to lowercase.

#!/bin/bash # #  Changes every filename in working directory to all lowercase. # #  Inspired past a script of John Dubois, #+ which was translated into Bash by Chet Ramey, #+ and considerably simplified by the author of the ABS Guide.   for filename in *                # Traverse all files in directory. do    fname=`basename $filename`    n=`echo $fname | tr A-Z a-z`  # Change name to lowercase.    if [ "$fname" != "$n" ]       # Rename just files not already lowercase.    then      mv $fname $n    fi   done     exit $?   # Code below this line will non execute considering of "go out". #--------------------------------------------------------# # To run it, delete script above line.  # The above script volition non work on filenames containing blanks or newlines. # Stephane Chazelas therefore suggests the following culling:   for filename in *    # Not necessary to employ basename,                      # since "*" won't render whatever file containing "/". practice n=`echo "$filename/" | tr '[:upper:]' '[:lower:]'` #                             POSIX char set notation. #                    Slash added so that abaft newlines are not #                    removed by command substitution.    # Variable substitution:    due north=${n%/}          # Removes trailing slash, added above, from filename.    [[ $filename == $due north ]] || mv "$filename" "$north"                      # Checks if filename already lowercase. done  exit $?

Example sixteen-23. du: DOS to UNIX text file conversion.

#!/bin/bash # Du.sh: DOS to UNIX text file converter.  E_WRONGARGS=85  if [ -z "$1" ] and then   repeat "Usage: `basename $0` filename-to-convert"   exit $E_WRONGARGS fi  NEWFILENAME=$ane.unx  CR='\015'  # Carriage return.            # 015 is octal ASCII lawmaking for CR.            # Lines in a DOS text file end in CR-LF.            # Lines in a UNIX text file terminate in LF only.  tr -d $CR < $1 > $NEWFILENAME # Delete CR's and write to new file.  echo "Original DOS text file is \"$i\"." echo "Converted UNIX text file is \"$NEWFILENAME\"."  get out 0  # Do: # -------- # Change the to a higher place script to convert from UNIX to DOS.

Example 16-24. rot13: ultra-weak encryption.

#!/bin/bash # rot13.sh: Classic rot13 algorithm, #           encryption that might fool a three-year old #           for about 10 minutes.  # Usage: ./rot13.sh filename # or     ./rot13.sh <filename # or     ./rot13.sh and supply keyboard input (stdin)  true cat "$@" | tr 'a-zA-Z' 'n-za-mN-ZA-M'   # "a" goes to "north", "b" to "o" ... #  The   cat "$@"   construct #+ permits input either from stdin or from files.  leave 0

Case 16-25. Generating "Crypto-Quote" Puzzles

#!/bin/bash # crypto-quote.sh: Encrypt quotes  #  Will encrypt famous quotes in a simple monoalphabetic substitution. #  The consequence is similar to the "Crypto Quote" puzzles #+ seen in the Op Ed pages of the Sunday paper.   key=ETAOINSHRDLUBCFGJMQPVWZYXK # The "cardinal" is nothing more than a scrambled alphabet. # Changing the "key" changes the encryption.  # The 'cat "$@"' construction gets input either from stdin or from files. # If using stdin, terminate input with a Command-D. # Otherwise, specify filename as control-line parameter.  cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key" #        |  to capital letter  |     encrypt        # Will work on lowercase, uppercase, or mixed-case quotes. # Passes non-alphabetic characters through unchanged.   # Effort this script with something similar: # "Nada and so needs reforming as other people's habits." # --Mark Twain # # Output is: # "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ." # --BEML PZERC  # To opposite the encryption: # cat "$@" | tr "$key" "A-Z"   #  This elementary-minded cipher can be broken by an average 12-year old #+ using only pencil and paper.  exit 0  #  Exercise: #  -------- #  Modify the script then that it volition either encrypt or decrypt, #+ depending on command-line argument(s).

Of course, tr lends itself to code obfuscation.

#!/bin/fustigate # jabh.sh  x="wftedskaebjgdBstbdbsmnjgz" echo $10 | tr "a-z" 'oh, turtleneck Phrase Jar!'  # Based on the Wikipedia "Just another Perl hacker" article.

fold

A filter that wraps lines of input to a specified width. This is specially useful with the -s option, which breaks lines at word spaces (run across Example 16-26 and Example A-1).

fmt

Elementary-minded file formatter, used as a filter in a pipage to "wrap" long lines of text output.

Example 16-26. Formatted file listing.

#!/bin/bash  WIDTH=40                    # 40 columns wide.  b=`ls /usr/local/bin`       # Get a file listing...  echo $b | fmt -w $WIDTH  # Could likewise have been done past #    repeat $b | fold - -s -w $WIDTH   exit 0

Meet also Case 16-5.

col

This deceptively named filter removes opposite line feeds from an input stream. Information technology also attempts to replace whitespace with equivalent tabs. The chief employ of col is in filtering the output from certain text processing utilities, such as groff and tbl.

cavalcade

Cavalcade formatter. This filter transforms list-type text output into a "pretty-printed" table by inserting tabs at appropriate places.

Example 16-27. Using cavalcade to format a directory listing

#!/bin/bash # colms.sh # A pocket-size modification of the example file in the "column" homo page.   (printf "PERMISSIONS LINKS Possessor GROUP SIZE MONTH DAY HH:MM PROG-NAME\n" \ ; ls -50 | sed 1d) | column -t #         ^^^^^^           ^^  #  The "sed 1d" in the pipe deletes the starting time line of output, #+ which would be "total        N", #+ where "N" is the full number of files found by "ls -50".  # The -t option to "column" pretty-prints a table.  go out 0

colrm

Column removal filter. This removes columns (characters) from a file and writes the file, lacking the range of specified columns, back to stdout. colrm 2 4 <filename removes the 2d through fourth characters from each line of the text file filename.

Caution

If the file contains tabs or nonprintable characters, this may crusade unpredictable behavior. In such cases, consider using aggrandize and unexpand in a pipage preceding colrm.

nl

Line numbering filter: nl filename lists filename to stdout, only inserts consecutive numbers at the get-go of each non-blank line. If filename omitted, operates on stdin.

The output of nl is very like to cat -b , since, past default nl does not list bare lines.

Example xvi-28. nl: A cocky-numbering script.

#!/bin/fustigate # line-number.sh  # This script echoes itself twice to stdout with its lines numbered.  repeat "     line number = $LINENO" # 'nl' sees this as line 4 #                                   (nl does not number bare lines). #                                   'cat -north' sees information technology correctly equally line #6.  nl `basename $0`  echo; echo  # Now, let's try information technology with 'cat -n'  cat -n `basename $0` # The divergence is that 'true cat -northward' numbers the blank lines. # Annotation that 'nl -ba' will also do so.  exit 0 # -----------------------------------------------------------------

pr

Print formatting filter. This volition paginate files (or stdout) into sections suitable for hard re-create printing or viewing on screen. Various options permit row and column manipulation, joining lines, setting margins, numbering lines, adding folio headers, and merging files, among other things. The pr command combines much of the functionality of nl, paste, fold, column, and aggrandize.

pr -o five --width=65 fileZZZ | more than gives a nice paginated listing to screen of fileZZZ with margins set at 5 and 65.

A particularly useful option is -d, forcing double-spacing (same effect every bit sed -G).

gettext

The GNU gettext package is a set up of utilities for localizing and translating the text output of programs into foreign languages. While originally intended for C programs, it at present supports quite a number of programming and scripting languages.

The gettext program works on beat out scripts. See the info page .

msgfmt

A programme for generating binary message catalogs. It is used for localization.

iconv

A utility for converting file(s) to a different encoding (character prepare). Its chief use is for localization.

# Catechumen a string from UTF-8 to UTF-16 and print to the BookList role write_utf8_string {     STRING=$1     BOOKLIST=$2     echo -n "$STRING" | iconv -f UTF8 -t UTF16 | \     cut -b 3- | tr -d \\n >> "$BOOKLIST" }  #  From Peter Knowles' "booklistgen.sh" script #+ for converting files to Sony Librie/PRS-50X format. #  (http://booklistgensh.peterknowles.com)

recode

Consider this a fancier version of iconv, in a higher place. This very versatile utility for converting a file to a different encoding scheme. Note that recode is not role of the standard Linux installation.

TeX, gs

TeX and Postscript are text markup languages used for preparing copy for printing or formatted video display.

TeX is Donald Knuth's elaborate typsetting arrangement. Information technology is frequently convenient to write a shell script encapsulating all the options and arguments passed to ane of these markup languages.

Ghostscript (gs) is a GPL-ed Postscript interpreter.

texexec

Utility for processing TeX and pdf files. Found in /usr/bin on many Linux distros, it is actually a shell wrapper that calls Perl to invoke Tex.

texexec --pdfarrange --upshot=Concatenated.pdf *pdf  #  Concatenates all the pdf files in the electric current working directory #+ into the merged file, Concatenated.pdf . . . #  (The --pdfarrange option repaginates a pdf file. See likewise --pdfcombine.) #  The above command-line could be parameterized and put into a shell script.

enscript

Utility for converting plain text file to PostScript

For example, enscript filename.txt -p filename.ps produces the PostScript output file filename.ps.

groff, tbl, eqn

Yet another text markup and display formatting language is groff. This is the enhanced GNU version of the venerable UNIX roff/troff display and typesetting bundle. Manpages use groff.

The tbl table processing utility is considered part of groff, as its function is to convert table markup into groff commands.

The eqn equation processing utility is also office of groff, and its function is to convert equation markup into groff commands.

Example 16-29. manview: Viewing formatted manpages

#!/bin/bash # manview.sh: Formats the source of a man page for viewing.  #  This script is useful when writing man page source. #  It lets you look at the intermediate results on the fly #+ while working on information technology.  E_WRONGARGS=85  if [ -z "$1" ] then   echo "Usage: `basename $0` filename"   exit $E_WRONGARGS fi  # --------------------------- groff -Tascii -homo $1 | less # From the human being page for groff. # ---------------------------  #  If the man page includes tables and/or equations, #+ then the above code volition barf. #  The following line tin can handle such cases. # #   gtbl < "$i" | geqn -Tlatin1 | groff -Tlatin1 -mtty-char -man # #   Cheers, S.C.  leave $?   # See also the "maned.sh" script.

See also Example A-39.

lex, yacc

The lex lexical analyzer produces programs for pattern matching. This has been replaced by the nonproprietary flex on Linux systems.

The yacc utility creates a parser based on a set of specifications. This has been replaced by the nonproprietary bison on Linux systems.

goinrobse1985.blogspot.com

Source: https://tldp.org/LDP/abs/html/textproc.html

0 Response to "How to Read the Data From Text File in Bash Scripting"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel