Introducing Spark NLP: State of the art NLP at Scale

The talk will demonstrate using these algorithms to build commonly used pipelines, using PySpark on notebooks that will be made publicly available after the talk.

Start

March 7, 2020 - 5:00 pm

End

March 7, 2020 - 6:00 pm

Address

Online Webinar   View map

Categories

Open Class

IDEAS Online Free Webinar

IDEAS & Data Application Lab co-host this live webinar.


国际数据科学与工程协会 IDEAS

IDEAS is a global nonprofit organization that is dedicated to fostering the data engineering and data science ecosystems and broadening the adoption of their underlying technologies to accelerate the innovations data can bring to society. Our goal is to create a community to connect AI and Data Science enthusiasts. All of the conferences that IDEAS host will demonstrate cutting-edge technology and feature a variety of AI and Data Science experts covering topics including industry trends, real-world applications, open-source software, solutions-based case studies, and many others.


Guest Speakers: David Talby

Topics: Introducing Spark NLP: State of the art NLP at Scale

Description: 

Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks. This talk introduces the NLP library for Apache Spark. It natively extends the Spark ML pipeline API’s which enabling zero-copy, distributed, combined NLP & ML pipelines, which leverage all of Spark’s built-in optimizations. Benchmarks and design best practices for building NLP, ML and DL pipelines on Spark will be shared. The library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking and sentiment detection. The talk will demonstrate using these algorithms to build commonly used pipelines, using PySpark on notebooks that will be made publicly available after the talk.

Speaker’s Profile:

David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe. Earlier, he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.

About Data Application Lab:

About us

About IDEAS:

Home

MORE DETAIL

Email

info@DataAppLab.com