Gabriel Altay
  • Gabriel Altay
  • Projects
  • Resume
  • Talks
  • Blog
  • Gabriel Altay
  • Projects
  • Resume
  • Talks
  • Blog

PROJECTS

Credibility Coalition - Economics of Misinformation Working Group

I became a member of the Credibility Coalition in early 2020. At the beginning of 2021 I became chair of the Economics of Misinformation working group.  Most recently we have been examining how programmatic advertising causes brands to unintentionally pay millions of dollars to spreaders of misinformation.
Picture
  • Examining Opaque Programmatic Markets with the Credibility Coalition AdSellers Dataset
  • MozFest 2021 CredCo Update Slides
  • The Case for Using Market Forces to Combat Misinformation and Disinformation (Part II)
  • ​The Case for Using Market Forces to Combat Misinformation and Disinformation (Part I)

Kensho Derived Wikimedia Dataset

A Kaggle hosted dataset that serves as a sandbox for doing Natural Language Processing with Wikimedia data.  Includes plain text and  link offsets from Wikipedia and the Wikidata knowledge graph.
Picture
  • Dataset: Kensho Derived Wikimedia Dataset
  • Kernel: Wikipedia Introduction
  • Kernel: Wikidata Introduction
  • Kernel: Entity Aliases and Disambiguation Candidates
  • Blog: Introducing the Kensho Derived Wikimedia Dataset​
  • AWS Machine Learning Blog using KDWD for Graph Embeddings

qwikidata Python Package

A Python package that allows you to represent Wikidata items, properties, and lexemes as classes.
Picture
  • qwikidata github page
  • qwikidata docs

hilbertcurve Python Package

A Python package for calculating Hilbert Curves in arbitrary dimensions and using arbitrarily large integers. Supports multiprocessing for converting a large number of points to distances along the curve or vice versa. 
Picture
  • hilbertcurve github page
  • hilbertcurve PyPI page

American Voter Project Collaboration

Picture
Steve Tingley-Hock of the American Voter Project (the non-profit behind the Ohio Voter Project) has been collecting voter registration data for years. In collaboration with him, I've produced two Kaggle datasets and associated example notebooks. 
  • Georgia Voter Lists from October - December 2020 and Census Cartographic Boundaries 
  • ​Ohio Census Data 
Picture
You can learn more about Steve's work at,
  • (VICE) This Man Exposed 40K Voters Purged by Mistake
  • (Wired) One IT Guy's Spreadsheet-Fueled Race to Restore Voting Rights
  • (New York Times) Ohio Was Set to Purge 235,000 Voters. It Was Wrong About 20%

WikiWhatsThis

A research project to build a browser extension that will help people reach Wikipedia articles relevant to stories they are viewing online. A perpetual work in progress. 
Picture
  • WikiWhatsThis github page
  • Blog: WikiWhatsThis: Initial Open Sourcing and Next Steps
  • ​​Blog: WikiWhatsThis Will Battle Misinformation by Grounding Online Stories with Wikipedia Content

Mentor for Harvard IACS Capstone Courses  

Work done in collaboration with students in the Institute for Applied Computational Science Capstone Research course. The students typically write blog posts summarizing their work.
  • Fall 2020: ​Improving Named Entity Disambiguation using Entity Relatedness within Wikipedia
  • Spring 2020: Context-Based Entity Linking Using KDWD
  • ​Fall 2019: Named Entity Disambiguation Boosted with Knowledge Graphs
  • Fall 2019: ​Back-Translation for Named Entity Recognition

Mentor for Cornell Financial Engineering Projects

  • Fall 2020: Comparing Unsupervised Company Clusters to GICS sectors using 10-K filings and Wikimedia.