Reading Multiple Text Files in Jupyter Notebook

Photograph past Sergey Leont'ev on Unsplash

Tips and Tricks to Work with Text Files in Python (Part-1)

Piece of work with Text Files and Become Familiar with Crawly Techniques in Python

Md. Zubair

Computer programs brand our lives easy. We tin perform a complex chore within a glimmer of an heart with t h due east blessings of a few lines of instructions. Amid many complex tasks, working and manipulating text is 1 of the most important tasks done by the computer. Today, i of the hot topics is Natural language Processing (NLP) where text processing is a must. In this article, we will talk over some basic and super like shooting fish in a barrel syntax for formatting and working with texts and text files. It may be considered the first step of learning Tongue Processing (NLP) with python. In the upcoming articles, nosotros will talk over NLP stride by pace.

It'southward time to move forward to our main agenda of the article. We will make our easily dirty with some basic coding examples which may help us practically. Permit's brainstorm……….

[Full jupyter notebook link is given at the end of the article]

A. Formatted String Literals (f-strings)

i. f-strings offer several benefits over the older .format() string method.
For one, yous tin bring outside variables immediately into a cord rather than pass them through equally keyword arguments:

          name = 'Zubair'          # Using the one-time .format() method:
print('My name is {var}.'.format(var=proper name))
# Using f-strings:
print(f'My name is {proper name}.')

The code generates the following outputs

Output of the above Lawmaking

If you lot want to have a cord representation of the variable just insert !r within {}.

          impress(f'My proper name is {name!r}')        

The output will be — My proper name is 'Zubair'

ii. Use of f-string and dictionary.

          d = {'a':123,'b':456}
print(f"Address: {d['a']} Principal Street")

It volition show the dictionary element confronting the dictionary key 'a' . The output for the code is — Address: 123 Main Street .

[North.B. Exist careful not to let quotation marks in the replacement fields conflict with the quoting used in the outer string. ]

If you utilise only "" or '', it volition cause an error as in a higher place.

iii. Minimum Widths, Alignment and Padding with f-string

You can pass arguments inside a nested fix of curly braces to gear up a minimum width for the field, the alignment and even padding characters. Consider the post-obit codes

          library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting', 601), ('Feynman', 'Physics', 95), ('Hamilton', 'Mythology', 144)]          for book in library:
print(f'{book[0]:{x}} {book[1]:{8}} {book[2]:{7}}')

Output:

          Writer     Topic    Pages            
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144

Here the first iii lines marshal, except Pages follows a default left-alignment while numbers are right-aligned. Also, the quaternary line's page number is pushed to the right equally Mythology exceeds the minimum field width of viii. When setting minimum field widths make sure to take the longest item into account.

To set the alignment, use the graphic symbol < for left-marshal, ^ for center, > for correct.
To gear up padding, precede the alignment character with the padding character (- and . are common choices).

Let's make some adjustments:

          for book in library:
print(f'{book[0]:{10}} {volume[one]:{10}} {book[two]:>{seven}}')
# here > was added

Output

          Author     Topic        Pages
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144

Centring the 3rd column

          for volume in library:
print(f'{book[0]:{10}} {book[i]:{10}} {book[2]:^{7}}')
# hither ^ was added

Output

          Author     Topic       Pages            
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144

Adding some ....

          for book in library:
print(f'{book[0]:{10}} {book[1]:{ten}} {volume[two]:.>{seven}}')
# here .> was added

Output

          Author     Topic      ..Pages
Twain Rafting ....601
Feynman Physics .....95
Hamilton Mythology ....144

iv. Date Formatting with f-string

You lot tin can do various formatting with f-strings. An instance is shown beneath.

          from datetime import datetime          today = datetime(year=2018, calendar month=1, 24-hour interval=27)          print(f'{today:%B %d, %Y}')        

For more info on formatted string literals visit https://docs.python.org/3/reference/lexical_analysis.html#f-strings

B. Working With Text Files

i. Creating a File with IPython

This function is specific to jupyter notebooks! Alternatively, quickly create a simple .txt file with Sublime text editor.

          %%writefile test.txt
Hi, this is a quick test file.
This is the second line of the file.

The above code will create a txt file with the same directory of the Jupyter Notebook name exam.txt

ii. Python Opening a File

          # Open the text.txt file nosotros created earlier
my_file = open('test.txt')

Y'all may get an mistake bulletin if you lot mistype the file name or provide the wrong directory. And so exist careful. Now, read the file.

          # Nosotros tin now read the file
my_file.read()

Output

          'Hi, this is a quick test file.\nThis is the 2d line of the file.'        

But if you run the my_file.read() code again, information technology will output simply '' . But why?

This happens because you tin imagine the reading "cursor" is at the cease of the file after having read it. So there is null left to read. We can reset the "cursor" like this:

          # Seek to the start of file (index 0)
my_file.seek(0)

Now, the cursor is reset to the beginning of the text file. If we run the my_file.read() code once more we will get the output 'Hello, this is a quick examination file.\nThis is the second line of the file.' .

iii. Reading line past line

You can read a file line by line using the .readlines() method. Use caution with large files, since everything will exist held in memory.

          # Readlines returns a list of the lines in the file
my_file.seek(0)
my_file.readlines()

Output

          ['Hi, this is a quick test file.\n', 'This is the second line of the file.'        

4. Writing to a File

By default, the open() function will simply allow u.s. to read the file. We need to pass the argument 'w' to write over the file. For example:

          # Add a second argument to the function, 'w' which stands for write.
# Passing 'w+' lets us read and write to the file
my_file = open('test.txt','due west+')

Opening a file with 'w' or 'west+' *truncates the original*, meaning that anything that was in the original file **is deleted**!

          # Write to the file
my_file.write('This is a new first line')

The to a higher place command writes 'This is a new first line' the created file.

v. Appending to a File

Passing the statement 'a' opens the file and puts the pointer at the stop, so anything written is appended. Like 'westward+', 'a+' lets us read and write to a file. If the file does non exist, 1 will be created.

          my_file = open up('test.txt','a+')
my_file.write('\nThis line is beingness appended to exam.txt')
my_file.write('\nAnd some other line here.')

The above code will append the text at the terminate of the existing text in the examination.txt file.

half dozen. Aliases and Context Managers

You tin can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a context managing director:

          with open up('test.txt','r') equally txt:
first_line = txt.readlines()[0]
print(first_line)

This lawmaking volition print the showtime judgement of examination.txt file. For this fourth dimension the output is This is a new first line

[N.B. The with ... equally ...: context director automatically closed test.txt afterwards assigning the first line of text to first_line:]

If we attempt to read the test.txt, it volition show an error message because the file has been closed automatically

seven. Iterating through a File

          with open up('test.txt','r') as txt:
for line in txt:
print(line, cease='') # the end='' argument removes actress linebreaks

Output

          This is a new first line
This line is existence appended to examination.txt
And some other line hither.
This is more text existence appended to examination.txt
And another line here.

Photo by Saksham Gangwar on Unsplash

Conclusion

The syntaxes are easy only super helpful. We always spring onto sophisticated learning materials but there exist so many tiny things which tin make our life easy. And the in a higher place article explains some of the useful techniques to play with text which might be helpful for text formatting. However, these techniques will be very helpful for Natural Linguistic communication Processing. The write-ups will be connected and heading towards bones to advance of NLP.

The total jupyter notebook for the article is available here.

Some called interesting article for your farther reading

Zubair Hossain

  • If you enjoy the commodity, follow me on medium for more.
  • Connect me on LinkedIn for collaboration.

thomasbutfor.blogspot.com

Source: https://towardsdatascience.com/tips-and-tricks-to-work-with-text-files-in-python-89f14a755315

0 Response to "Reading Multiple Text Files in Jupyter Notebook"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel