BERTopic: An Overview of Speech Conferences and Topic Representations
BERTopic is a powerful topic modeling technique with a number of different topic representations that we can choose from. They are all quite different from one another and give interesting perspectives and variations of topic representations. However, you might want to use something more powerful to describe your clusters.
By default, the main steps for topic modeling with BERTopic are sentence-transformers, UMAP, HDBSCAN, and c-TF-IDF run in sequence. However, it assumes some independence between these steps which makes BERTopic quite modular. You can swap out any of these models or even remove them entirely.
BERTopic has many functions that quickly can become overwhelming. After having trained your BERTopic model, several are saved within your model.
Here's a breakdown of the default steps in BERTopic:
- Sentence-Transformers: This step embeds your documents into a high-dimensional space.
- UMAP: UMAP (Uniform Manifold Approximation and Projection) reduces the dimensionality of the embeddings while preserving the global structure of the data.
- HDBSCAN: HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) clusters the reduced embeddings into topics.
- c-TF-IDF: c-TF-IDF (Class-based Term Frequency-Inverse Document Frequency) identifies the most important words in each topic.

Because of the modularity of BERTopic, you can customize each step to fit your specific needs. For example, you might want to use a different dimensionality reduction technique or a different clustering algorithm. You can even use pre-trained embeddings or custom scoring functions.
Model Information Storage: Refers to how model information is stored on an estimator during fitting.
Use Cases: There are many different use cases in which topic modeling can be used.
Consider the following scenario as a use-case example:
| Scenario | Example |
|---|---|
| Analyzing Customer Feedback | Identify common themes in customer reviews to improve product development. |
| Organizing Research Papers | Group research papers based on their topics for easier literature review. |
| Content Recommendation | Recommend articles or videos to users based on their interests. |