BERT

free

Model Name: BERT (Bidirectional Encoder Representations from Transformers)

Keywords: Bidirectional, Language-understanding, Transformer-based

Introduction

BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing model developed by Google in 2018 that revolutionized how machines understand human language. Unlike its predecessors that processed text in a single direction, BERT analyzes text bidirectionally by examining the context of words in relation to all other words in a sentence, rather than just looking at words that come before or after sequentially.

BERT's key innovation is its pre-training method, where it learns to predict randomly masked words in a sentence and understand relationships between sentences. After this general language understanding is established, BERT can be fine-tuned with additional training data for specific tasks such as classification, named entity recognition, or question answering, often achieving state-of-the-art results with relatively little task-specific data.

Instructions

1. Choose Implementation Method

Hugging Face Transformers: Most popular and easiest method
TensorFlow Hub: Google's official implementation
PyTorch: Flexible implementation for research

# Using Hugging Face Transformers from transformers import BertModel, BertTokenizer # Load pre-trained model and tokenizer tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased')

2. Prepare Your Text

Tokenize and encode your input text
Handle padding and attention masks
Convert to appropriate format (PyTorch tensors, TensorFlow tensors)

# Tokenize and encode text = "Example sentence to encode." encoded_input = tokenizer(text, return_tensors='pt', padding=True, truncation=True) # encoded_input will contain input_ids, token_type_ids, and attention_mask

3. Run Inference or Fine-Tuning

Inference: Get embeddings from pre-trained BERT
Fine-tuning: Train on specific task with labeled data
Common tasks: Classification, NER, Q&A, sentiment analysis

# Get embeddings (inference) outputs = model(**encoded_input) embeddings = outputs.last_hidden_state # Shape: [batch_size, sequence_length, hidden_size] pooled_output = outputs.pooler_output # Shape: [batch_size, hidden_size]

4. Use the Results

Extract contextual word embeddings for token-level tasks
Use pooled output for sentence-level tasks
Pass to downstream task-specific layers

Capabilities

Text Classification

• Sentiment analysis
• Topic categorization
• Intent detection

Named Entity Recognition

• Person, organization, location detection
• Product and brand identification
• Domain-specific entity extraction

Question Answering

• Extractive QA from documents
• Context understanding
• Information retrieval

Semantic Analysis

• Paraphrase detection
• Textual similarity assessment
• Text summarization

Examples

# Example: Sentiment analysis with BERT from transformers import BertForSequenceClassification, BertTokenizer import torch # Load fine-tuned model and tokenizer model_name = "nlptown/bert-base-multilingual-uncased-sentiment" tokenizer = BertTokenizer.from_pretrained(model_name) model = BertForSequenceClassification.from_pretrained(model_name) # Prepare input text = "I love this product! It works exactly as described." inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) # Get prediction with torch.no_grad(): outputs = model(**inputs) # Get predicted class (1-5 stars) predicted_class = torch.argmax(outputs.logits, dim=1).item() + 1 print(f"Sentiment rating: {predicted_class} stars")

# Example: Named Entity Recognition with BERT from transformers import BertForTokenClassification, BertTokenizer import torch # Load fine-tuned model for NER model_name = "dslim/bert-base-NER" tokenizer = BertTokenizer.from_pretrained(model_name) model = BertForTokenClassification.from_pretrained(model_name) # Prepare input text = "Apple is looking at buying U.K. startup for $1 billion" inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.argmax(outputs.logits, dim=2) # Map predictions to tokens and labels tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0]) label_list = model.config.id2label labels = [label_list[prediction.item()] for prediction in predictions[0]] # Process results for token, label in zip(tokens, labels): print(f"{token}: {label}")

Key Features

• Bidirectional Context: Examines words from both directions
• Pre-training/Fine-tuning: Transfer learning approach
• Masked Language Modeling: Predicts missing words
• Next Sentence Prediction: Understands sentence relationships
• WordPiece Tokenization: Handles rare words effectively

Available Variants

• BERT-Base: 12 layers, 110M parameters
• BERT-Large: 24 layers, 340M parameters
• DistilBERT: Lightweight, 40% smaller, 60% faster
• RoBERTa: Optimized BERT training
• Multilingual BERT: Supports 104 languages

Pandora's Box