6 Tools for converting files to open and reusable format
Discover 6 tools for converting closed files into open data, which you can use for your stories and visualizations.
Por Sasha Muñoz Vergara. Publicado: 19 de julio de 2021
Data journalism is a meticulous job. Mistakes are not allowed. A single number changes, and we can fall into reporting false data and miscalculations.
Although fact-checking is indispensable, there are other types of tools that can help us copy the exact information from a document, trying to minimize as much as possible the errors of passing data by hand.
One of the tasks that data journalists must perform is to convert closed data, where the specifications are not publicly available because the format is not accessible or because its reuse is limited, into open data, which are available to anyone, free of charge and without any limitation.
For this, there is a procedure called Optical Character Recognition (OCR). According to Wikipedia’s definition, it is used to digitize texts, automatically identified from an image, symbols, or characters belonging to a certain alphabet and then stored as data. In this way, it is possible to interact with such data through an editing program or similar.
Below you will find a selection of tools for opening information
This software allows you to scan files, edit a PDF and convert files to the required format such as Word, Excel, PowerPoint, and jpg. It also helps organize and optimize files by merging, splitting, compressing, rotating, and annotating documents. In addition, it includes a function to translate documents in multiple languages and use the drag-and-drop interface to process multiple files simultaneously from a unified platform.
It is an online software to convert files to formats such as Word, Excel, PowerPoint, MP4, and MP3, among many others. Among the qualities of this tool is that you don’t need to download any program to use it. If there is a file that you cannot convert, you can write an email, and specialized engineers will help you do it. Besides, it supports more than 1200 types of formats.
This tool does not require installation on the computer. It can recognize text and characters from scanned PDF documents, photographs, and images captured by digital cameras, tables, columns, and graphics in open and editable format. This software allows you to convert 15 files per hour and 15 pages of a file that has more in the same time, without registration.
This software allows you to transform any PDF (normal or scanned) and import it directly from Google Drive, Dropbox, or OneDrive. The web application converts documents of all sizes. It is one of the software with the fewest errors. No matter how complex your data tables are, they will be transcribed accurately, without formatting. The row and column structure will be the same as in the original file but editable and ready for reuse.
It is a downloadable software that you must install on your computer. Tabula allows you to extract data from a PDF and convert it into a CSV or Microsoft Excel spreadsheet using a simple and easy-to-use interface. Tabula works on Mac, Windows, and Linux.
You upload a PDF file containing a table of data. To extract it, you have to go to the page you want and select the table by clicking and dragging to draw a box around it. Then, you have to click on “Preview and export the extracted data.”.
Tabula will try to extract the data and show a preview. If data is missing, you can re-adjust the selection. When exported, you will have a table in an open and reusable format.
It is a program that allows you to convert PDF files to 16 document formats, including Microsoft Office and iWork. It is specialized software for Mac. It preserves the original PDF font style, color, graphics, tables, and page layouts accurately. It also offers other types of applications that can help you in your journalistic work, such as the ones you can see here:
This version is paid, but you can access the free trial here.
You can access all these tools at Datasketch. So visit us and try them out.