Pandora's Box Logo

Pandora's Box

BERT

free

Model Name: BERT (Bidirectional Encoder Representations from Transformers)
Keywords: Bidirectional, Language-understanding, Transformer-based

Introduction

BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing model developed by Google in 2018 that revolutionized how machines understand human language. Unlike its predecessors that processed text in a single direction, BERT analyzes text bidirectionally by examining the context of words in relation to all other words in a sentence, rather than just looking at words that come before or after sequentially.

BERT's key innovation is its pre-training method, where it learns to predict randomly masked words in a sentence and understand relationships between sentences. After this general language understanding is established, BERT can be fine-tuned with additional training data for specific tasks such as classification, named entity recognition, or question answering, often achieving state-of-the-art results with relatively little task-specific data.

Instructions

1. Choose Implementation Method

  • Hugging Face Transformers: Most popular and easiest method
  • TensorFlow Hub: Google's official implementation
  • PyTorch: Flexible implementation for research
# Using Hugging Face Transformers from transformers import BertModel, BertTokenizer # Load pre-trained model and tokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased')

2. Prepare Your Text

  • Tokenize and encode your input text
  • Handle padding and attention masks
  • Convert to appropriate format (PyTorch tensors, TensorFlow tensors)
# Tokenize and encode text = "Example sentence to encode." encoded_input = tokenizer(text, return_tensors='pt', padding=True, truncation=True) # encoded_input will contain input_ids, token_type_ids, and attention_mask

3. Run Inference or Fine-Tuning

  • Inference: Get embeddings from pre-trained BERT
  • Fine-tuning: Train on specific task with labeled data
  • Common tasks: Classification, NER, Q&A, sentiment analysis
# Get embeddings (inference) outputs = model(**encoded_input) embeddings = outputs.last_hidden_state # Shape: [batch_size, sequence_length, hidden_size] pooled_output = outputs.pooler_output # Shape: [batch_size, hidden_size]

4. Use the Results

  • Extract contextual word embeddings for token-level tasks
  • Use pooled output for sentence-level tasks
  • Pass to downstream task-specific layers

Capabilities

Text Classification

  • • Sentiment analysis
  • • Topic categorization
  • • Intent detection

Named Entity Recognition

  • • Person, organization, location detection
  • • Product and brand identification
  • • Domain-specific entity extraction

Question Answering

  • • Extractive QA from documents
  • • Context understanding
  • • Information retrieval

Semantic Analysis

  • • Paraphrase detection
  • • Textual similarity assessment
  • • Text summarization

Examples

# Example: Sentiment analysis with BERT from transformers import BertForSequenceClassification, BertTokenizer import torch # Load fine-tuned model and tokenizer model_name = "nlptown/bert-base-multilingual-uncased-sentiment" tokenizer = BertTokenizer.from_pretrained(model_name) model = BertForSequenceClassification.from_pretrained(model_name) # Prepare input text = "I love this product! It works exactly as described." inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) # Get prediction with torch.no_grad(): outputs = model(**inputs) # Get predicted class (1-5 stars) predicted_class = torch.argmax(outputs.logits, dim=1).item() + 1 print(f"Sentiment rating: {predicted_class} stars")
# Example: Named Entity Recognition with BERT from transformers import BertForTokenClassification, BertTokenizer import torch # Load fine-tuned model for NER model_name = "dslim/bert-base-NER" tokenizer = BertTokenizer.from_pretrained(model_name) model = BertForTokenClassification.from_pretrained(model_name) # Prepare input text = "Apple is looking at buying U.K. startup for $1 billion" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=2) # Map predictions to tokens and labels tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) label_list = model.config.id2label labels = [label_list[prediction.item()] for prediction in predictions[0]] # Process results for token, label in zip(tokens, labels): print(f"{token}: {label}")

Key Features

  • Bidirectional Context: Examines words from both directions
  • Pre-training/Fine-tuning: Transfer learning approach
  • Masked Language Modeling: Predicts missing words
  • Next Sentence Prediction: Understands sentence relationships
  • WordPiece Tokenization: Handles rare words effectively

Available Variants

  • BERT-Base: 12 layers, 110M parameters
  • BERT-Large: 24 layers, 340M parameters
  • DistilBERT: Lightweight, 40% smaller, 60% faster
  • RoBERTa: Optimized BERT training
  • Multilingual BERT: Supports 104 languages