Say we have already obtained text (.txt) files (either through webscraping or other means). We now want to upload these files into R. An easy way to do that is via the “readtext” package.
library("readtext")
We will work with sample text data of Supreme Court of Canada cases. They are available under the DATASET tab. Please download them into a folder on your hard drive. You must ensure that the folder only contains the texts you want to import into R.
# Change the below folder path to the path of your target folder. Important: don't forget the little asterix *. It indicates that you want to import all files in that folder. Also make sure you only store the files you want to import in that folder.
folder <- "~/Google Drive/Teaching/Canada/Legal Data Science/2019/Data/Supreme Court Cases/*"
Now we can upload the texts from that target folder using the readtext() function.
scc_texts <- readtext(folder)
As you can see, the object is a dataframe that contains both the file name and its text.
print (scc_texts)
##
readtext object consisting of 25 documents and 0 docvars.
# data.frame [25 x 2]
doc_id text
<chr> <chr>
1 [2013] 1 S.C.R. 467.txt “\”SUPREME CO\”…”
2 [2013] 1 S.C.R. 61.txt “\”SUPREME CO\”…”
3 [2013] 1 S.C.R. 623.txt “\”SUPREME CO\”…”
4 [2013] 2 S.C.R. 227.txt “\”SUPREME CO\”…”
5 [2013] 3 S.C.R. 1053.txt “\”SUPREME CO\”…”
6 [2013] 3 S.C.R. 1101.txt “\”SUPREME CO\”…”
# … with 19 more rows
In order to only work with the text use the $ operator which selects the text column.
scc_texts$text
access_time Last update May 8, 2020.