Reading Multiple Text Files in Jupyter Notebook
Tips and Tricks to Work with Text Files in Python (Part-1)
Piece of work with Text Files and Become Familiar with Crawly Techniques in Python
Computer programs brand our lives easy. We tin perform a complex chore within a glimmer of an heart with t h due east blessings of a few lines of instructions. Amid many complex tasks, working and manipulating text is 1 of the most important tasks done by the computer. Today, i of the hot topics is Natural language Processing (NLP) where text processing is a must. In this article, we will talk over some basic and super like shooting fish in a barrel syntax for formatting and working with texts and text files. It may be considered the first step of learning Tongue Processing (NLP) with python. In the upcoming articles, nosotros will talk over NLP stride by pace.
It'southward time to move forward to our main agenda of the article. We will make our easily dirty with some basic coding examples which may help us practically. Permit's brainstorm……….
[Full jupyter notebook link is given at the end of the article]
A. Formatted String Literals (f-strings)
i. f-strings offer several benefits over the older .format() string method.
For one, yous tin bring outside variables immediately into a cord rather than pass them through equally keyword arguments:
name = 'Zubair' # Using the one-time .format() method:
print('My name is {var}.'.format(var=proper name)) # Using f-strings:
print(f'My name is {proper name}.')
The code generates the following outputs
If you lot want to have a cord representation of the variable just insert !r within {}.
impress(f'My proper name is {name!r}') The output will be — My proper name is 'Zubair'
ii. Use of f-string and dictionary.
d = {'a':123,'b':456}
print(f"Address: {d['a']} Principal Street") It volition show the dictionary element confronting the dictionary key 'a' . The output for the code is — Address: 123 Main Street .
[North.B. Exist careful not to let quotation marks in the replacement fields conflict with the quoting used in the outer string. ]
If you utilise only "" or '', it volition cause an error as in a higher place.
iii. Minimum Widths, Alignment and Padding with f-string
You can pass arguments inside a nested fix of curly braces to gear up a minimum width for the field, the alignment and even padding characters. Consider the post-obit codes
library = [('Author', 'Topic', 'Pages'), ('Twain', 'Rafting', 601), ('Feynman', 'Physics', 95), ('Hamilton', 'Mythology', 144)] for book in library:
print(f'{book[0]:{x}} {book[1]:{8}} {book[2]:{7}}')
Output:
Writer Topic Pages
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144 Here the first iii lines marshal, except Pages follows a default left-alignment while numbers are right-aligned. Also, the quaternary line's page number is pushed to the right equally Mythology exceeds the minimum field width of viii. When setting minimum field widths make sure to take the longest item into account.
To set the alignment, use the graphic symbol < for left-marshal, ^ for center, > for correct.
To gear up padding, precede the alignment character with the padding character (- and . are common choices).
Let's make some adjustments:
for book in library:
print(f'{book[0]:{10}} {volume[one]:{10}} {book[two]:>{seven}}')
# here > was added Output
Author Topic Pages
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144 Centring the 3rd column
for volume in library:
print(f'{book[0]:{10}} {book[i]:{10}} {book[2]:^{7}}')
# hither ^ was added Output
Author Topic Pages
Twain Rafting 601
Feynman Physics 95
Hamilton Mythology 144 Adding some ....
for book in library:
print(f'{book[0]:{10}} {book[1]:{ten}} {volume[two]:.>{seven}}')
# here .> was added Output
Author Topic ..Pages
Twain Rafting ....601
Feynman Physics .....95
Hamilton Mythology ....144 iv. Date Formatting with f-string
You lot tin can do various formatting with f-strings. An instance is shown beneath.
from datetime import datetime today = datetime(year=2018, calendar month=1, 24-hour interval=27) print(f'{today:%B %d, %Y}')
For more info on formatted string literals visit https://docs.python.org/3/reference/lexical_analysis.html#f-strings
B. Working With Text Files
i. Creating a File with IPython
This function is specific to jupyter notebooks! Alternatively, quickly create a simple .txt file with Sublime text editor.
%%writefile test.txt
Hi, this is a quick test file.
This is the second line of the file. The above code will create a txt file with the same directory of the Jupyter Notebook name exam.txt
ii. Python Opening a File
# Open the text.txt file nosotros created earlier
my_file = open('test.txt') Y'all may get an mistake bulletin if you lot mistype the file name or provide the wrong directory. And so exist careful. Now, read the file.
# Nosotros tin now read the file
my_file.read() Output
'Hi, this is a quick test file.\nThis is the 2d line of the file.' But if you run the my_file.read() code again, information technology will output simply '' . But why?
This happens because you tin imagine the reading "cursor" is at the cease of the file after having read it. So there is null left to read. We can reset the "cursor" like this:
# Seek to the start of file (index 0)
my_file.seek(0) Now, the cursor is reset to the beginning of the text file. If we run the my_file.read() code once more we will get the output 'Hello, this is a quick examination file.\nThis is the second line of the file.' .
iii. Reading line past line
You can read a file line by line using the .readlines() method. Use caution with large files, since everything will exist held in memory.
# Readlines returns a list of the lines in the file
my_file.seek(0)
my_file.readlines() Output
['Hi, this is a quick test file.\n', 'This is the second line of the file.' 4. Writing to a File
By default, the open() function will simply allow u.s. to read the file. We need to pass the argument 'w' to write over the file. For example:
# Add a second argument to the function, 'w' which stands for write.
# Passing 'w+' lets us read and write to the file
my_file = open('test.txt','due west+') Opening a file with 'w' or 'west+' *truncates the original*, meaning that anything that was in the original file **is deleted**!
# Write to the file
my_file.write('This is a new first line') The to a higher place command writes 'This is a new first line' the created file.
v. Appending to a File
Passing the statement 'a' opens the file and puts the pointer at the stop, so anything written is appended. Like 'westward+', 'a+' lets us read and write to a file. If the file does non exist, 1 will be created.
my_file = open up('test.txt','a+')
my_file.write('\nThis line is beingness appended to exam.txt')
my_file.write('\nAnd some other line here.') The above code will append the text at the terminate of the existing text in the examination.txt file.
half dozen. Aliases and Context Managers
You tin can assign temporary variable names as aliases, and manage the opening and closing of files automatically using a context managing director:
with open up('test.txt','r') equally txt:
first_line = txt.readlines()[0]
print(first_line) This lawmaking volition print the showtime judgement of examination.txt file. For this fourth dimension the output is This is a new first line
[N.B. The with ... equally ...: context director automatically closed test.txt afterwards assigning the first line of text to first_line:]
If we attempt to read the test.txt, it volition show an error message because the file has been closed automatically
seven. Iterating through a File
with open up('test.txt','r') as txt:
for line in txt:
print(line, cease='') # the end='' argument removes actress linebreaks Output
This is a new first line
This line is existence appended to examination.txt
And some other line hither.
This is more text existence appended to examination.txt
And another line here.
Conclusion
The syntaxes are easy only super helpful. We always spring onto sophisticated learning materials but there exist so many tiny things which tin make our life easy. And the in a higher place article explains some of the useful techniques to play with text which might be helpful for text formatting. However, these techniques will be very helpful for Natural Linguistic communication Processing. The write-ups will be connected and heading towards bones to advance of NLP.
The total jupyter notebook for the article is available here.
Some called interesting article for your farther reading
Zubair Hossain
- If you enjoy the commodity, follow me on medium for more.
- Connect me on LinkedIn for collaboration.
Source: https://towardsdatascience.com/tips-and-tricks-to-work-with-text-files-in-python-89f14a755315
0 Response to "Reading Multiple Text Files in Jupyter Notebook"
Post a Comment