Pseudo App

This demo app is part of the document's pseudonymization effort lead at Etalab's Lab IA. Other Lab IA projects can be found at the Lab IA.

Project Status: [Active]

Intro/Objectives

The purpose of this repo is to provide a quick demo to the pseudonymization tool we developped. The larger goal of the pseudonymization project is to help France's Conseil d'État open their Justice decisions to the general public, as required by the law. More info about pseudonymization and this project can be found in our French pseudonymization guide here. Behind this web site, there is an API that does the job of text tagging and pseudonymization.

Methods Used

Natural Language Processing: Information Extraction : Named Entity Recognition
Natural Language Processing: Language Modelling / Feature Learning: Word embeddings
Machine Learning: Deep Learning: Recurrent Networks: BiLSTM+CRF

Technologies

Python
Flair, sacremoses
Dash
SQLite
Pandas

Demo Description

The demo consists in four tabs:

Introduction of the project: a brief insight into our pseudonymisation project,
Upload of a document to be pseudonymized: allows for an imageless .doc, .docx, or .txt file to be uploaded (up to 100 kB)
Comparison of volume of training data vs annotation performance: we try to answer the question how much data do I need to get decent results?
API Stats: the use stats of the API that actually does the work.

This demo depends by default on the pseudo API. The API is automatically pulled from its repo in the docker-compose file.

You do need to train a NER model with the Flair library. Unfortunately, we cannot share nor the model nor the data it was trained on as it contains non-public information.

Getting Started

The easiest way to run this application is by using Docker and Docker Compose.

Clone this repo (for help see this tutorial).
Create a .env file in the repo folder and indicates there the path of the local model to the .env file (variable : PSEUDO_MODEL_PATH) + the path of the API database (variable : PSEUDO_API_DB_PATH) + the url of the API (variable : PSEUDO_REST_API_URL). Note that you could also pass this env var to the app directly and you would not need run the API.
Launch the wrapper bash file run_docker.sh. This file will clean and rebuild the required Docker containers by calling docker-compose.yml.
Go to localhost/pseudo/

Project Deliverables

Contact

Feel free to contact @pedevineau or @psorianom or other Lab IA team members with any questions or if you are interested in contributing!

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
nginx		nginx
pseudo_app		pseudo_app
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
run_docker.sh		run_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nginx

nginx

pseudo_app

pseudo_app

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

docker-compose.yml

docker-compose.yml

run_docker.sh

run_docker.sh

Repository files navigation

Pseudo App

Project Status: [Active]

Intro/Objectives

Methods Used

Technologies

Demo Description

Getting Started

Project Deliverables

Contact

About

Releases

Packages

Contributors 3

Languages

License

etalab-ia/pseudo_app

Folders and files

Latest commit

History

Repository files navigation

Pseudo App

Project Status: [Active]

Intro/Objectives

Methods Used

Technologies

Demo Description

Getting Started

Project Deliverables

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages