Services

Data-capture

We help you to get and organize information from public data or different websites with scrapping.

Data-cleaning

We structure databases with information from multiple databases in multiple formats. Organization of information and standardization of variables.

Visualization apps

We create public data visualization applications so that your users can know and explore databases. We use the latest technologies in data visualization to communicate information.

Algorithms

We implement artificial intelligence algorithms to facilitate your work with data, from predictive algorithms to pattern recognition.

Web specials

We develop interactive web specials based on data. The specials have different visual components to guide your readers. See examples of our specials.

About us

Datasketch is a digital platform of investigative and data journalism. Our portal allows journalists, data scientists, social scientists and citizens in general to learn and consult on data visualizations, tools, software and in-depth research on various short-term issues. We have free data tools and different projects to bridge the gap between data and citizenship that facilitates the democratization of knowledge and a critical review of social realities based on information contrasts.

Our team

Juan Pablo Marín

Electronic engineer with a master's degree in computational statistics. Expert in data science with applications in multiple areas such as economics, hydrology and journalism.

Camila Achuri

Statistics and expert in R programming language. She has developed various applications of data visualization in mobility and open data subjects.

Juliana Galvis

Politologist and candidate for a Master in Digital Humanities. She is currently leading the development of the Who Is database, as well as supporting journalistic research and the creation of databases.

David Daza

Bachelor of Electronics. Expert in development of applications and websites with emphasis on data journalism and content management of multiple databases.

Verónica Toro

Anthropologist and researcher. Responsible for the management and organization of the data-community in Colombia and Latin America and provide support in journalistic investigations and the creation of databases.

Andrea Cervera

Journalist responsible for writing articles, provide investigative support and community manager.

Ana Hernández

Mathematician and expert in R programming language. She has collaborated for various projects such as Infraestructura Visible and in the development of visualization tools.

Contact

Track the media's most reported phrases

July 15, 2017

The real-time information on news sites, blogs and social networking sites changes dynamically and spreads rapidly through the web. Developing methods to interrogate and uncover stories from within information at this scale, requires that we think about how information content varies over time, how it is transmitted, and how it mutates as it spreads.

 

NIFTY is a system that finds mutations of a single piece of information across the daily news cycle. Based on Memetracker, each day the system parses through 3.5 million news articles and 2 million mentioned quotes to find the top clusters of quotes.

 

The tool utilises a process called incremental clustering, which is a novel, and highly-scalable, means of efficiently extracting and identifying variants of a single meme.

 

Separated into daily, weekly, monthly, and quarterly clusters, NIFTY provides a streamlined way to identify what phrases and quotes are making the news and the interest in stories over time.

 

The project was developed as a part of the Stanford summer research internship program in Computer Science (CURIS). The project was supported by several organizations and designed by Caroline Suen, Sandy Huang, and Chantat Eksombatchai advised by professor Jure Leskovec and research scientist Rok Sosic.



Datasketch

Data team