-
Notifications
You must be signed in to change notification settings - Fork 30
Home
THIS WIKI IS A WORK IN PROGRESS, SOME THINGS MENTIONED AREN'T YET TRUE OR IMPLEMENTED. PLEASE REFER TO THE README FOR NOW UNTIL THIS WARNING IS REMOVED
Welcome to the WearableIntelligenceSystem (WIS) documentation wiki! Here, we provide an overview of what the system is, what it can do, how it works, how you can contribute, etc. If you're keen to start and just want a quick overview, go ahead and read the Getting Started section below.
This is mainly written for developers and researchers. If you just want to use the WIS as a user, checkout the README, which describes how to get it set up.
If you read through this and can't find an answer to your question, please create an issue.
Checkout the main README for a high level view of what the system is and what it's trying to accomplish.
As a user, the Wearable Intelligence System is the homepage for your smart glasses, with tons of built-in smart glasses apps, always available HUD information, and the ability to launch any third party app, all with voice control. It's an answer to the problem that there is good smart glasses hardware being released but no useful apps to use, easy way to control the device, or simple way to develop applications for these glasses. It's also an attempt at an egocentric operation system (OS), an OS which needs to work very differently because of the form factor, interface, and use cases that wearable applications demand.
Ok, if you're here, you've already read the README. You're a developer, industry, start-up, or otherwise super user that doesn't just want the stock system.
If you are planning on making changes to the system, you'll need some basic knowledge of Android and Android Studio. If you've never used Android Studio before, we recommend you go run through this tutorial and then come back.
If you are planning on running your own server, too, you'll need knowledge of Unix/Linux command line, web servers (Apache/Nginx), and Python. None of this is very complex or hard to learn, but you'll need some basic background to build it. If that's an issue, never fear, you can just use the Emex Labs backend, which is already setup and ready to go.
Finally, if you upgrade the system in any way, please consider making a pull request, so everyone else can benefit from that change, too.
These abbreviations will be used everywhere, so read them twice.
WIS - Wearable Intelligence System
ASP - Android Smart Phone
ASG - Android Smart Glasses
GLBOX - GNU/Linux 'Single Board Computer'/Laptop (just a regular old computer server)
You'll need at least two pieces of hardware (consumer electronics) in order to run the system. We have three abbreviations for these devices that we'll use throughout the documentation:
- (REQUIRED) ASP - Android Smart Phone, currently supporting Android 9+
- (REQUIRED) ASG - Android Smart Glasses, currently supporting Android 5.1 (version update coming soon)
- (OPTIONAL) GLBOX - GNU/Linux Box, a linux laptop, single board computer, or cloud server
Please follow the instructions in the README for how to get the system up and running. Note that the consumer facing application uses a server hosted by Emex Labs. If you want to use your own server, you'll need to follow the developer setup and install instructions.
If you're a developer and you want to build the system on your own machine, follow the instructions here. Remember that there are 3 main hardware components to this system, and each has its own build process.
To install the system, you have three options:
- USER INSTALL - Install the pre-built APKs on your ASP and ASG, and use the Emex Labs public GLBOX.
- DEVELOPER INSTALL - Build your own APKs for ASP and ASG, and use the Emex Labs public GLBOX
- DEVELOPER+ INSTALL - Build your own APKs for ASP and ASG, and setup your own GLBOX
Head on back to the README Install section for instructions on how to install without any modifications to the application.
- Clone this repo:
git clone git@github.com:emexlabs/WearableIntelligenceSystem.git #clone main repo
git submodule update --init --recursive #clone submodules
- Setup and build the ASG app with Android Studio
- Setup and build the ASP app with Android Studio
- Start the ASG app on the ASG and the ASP app on the ASP, then follow the WiFi hotspot section in README Install section to get the system running.
This system provides the foundation for a wearable computing suite consisting of connected Android Smart Glasses (ASG), Android Smart Phone (ASP) and a GNU/Linux box (GLBOX).
The ASG acts as wearable sensors (camera, microphone, etc.) and wearable display (waveguides, birdbath, etc.). The ASP is running a server which the ASG connects to and streams POV camera video over Wifi. The ASP also is running a MediaPipe machine learning pipeline on incoming sensory data. The GLBOX also connects to the ASG using a TCP socket and it handles transcription, voice command, and programmatic control of the ASG from within a Linux development environment.
- Turn on the WiFi hotspot on the ASP.
- Connect the GLBOX and ASG to the ASP WiFi hotspot.
- Start the life_live_captions Python server on the GLBOX.
- Start the Mobile Compute app on the ASP.
- Start the smart glasses app on the ASG.
- Follow Setup in gnu_linux_box/glbox_main_app/README.md
- Activate virtualenv and launch app
cd gnu_linux_box
source venv/bin/activate #activate virtualenv
python3 main.py
The GLBOX (GNU/Linux Box (a computer running a GNU/Linux distribution operation system)) is part of the the Wearable Intelligence System that handles transcription, running commands, and saving memories.
Live Linux programmatic control of Android smart glasses running the Wearable Intelligence System app.
Run this on any laptop or single-board-computer running GNU/Linux.
- Open android_smart_glasses/smart_glasses_app Android app in Android studio.
- Plug in Android Smart Glasses and build + flash to device using Android Studio.
The latest system is developed on a Vuzix Blade 1.5.
Running on another pair of Android AR/MR glasses could work without issue or porting could happen in 48 hours if the hardware is supplied.
- Follow commands here to setup MediaPipe: https://google.github.io/mediapipe/getting_started/install.html#installing-on-debian-and-ubuntu
Run the following commands to build and run the app for the ASP:
cd android_smart_phone/mediapipe
./build_single_android.sh mediapipe/examples/android/src/java/com/google/mediapipe/apps/wearableai
- Run the android_smart_phone/mobile_compute_app on any modern-ish Android smart phone (a good CPU/GPU is reccomended for MediaPipe graph) that can make a WiFi hotspot.
All voice commands must be preceded by a wakeword
. A wakeword
is any word you choose to "wake up" the system and start listening to commands, commands will only be run if they follow a wakeword. Set your own wakeword by adding it to wakewords.txt, or just use an existing wakeword
from wakewords.txt. Choose a wakeword
that the Speech-To-Text system can reliably recognize.
ask Wolfram <query>
- ask Wolfram Alpha a natural language
add wake word <wakeword>
- add a new to wakewords.txt
save speech
- save the transcribed speech to a file. This can be used to save ideas, thoughts, notes, reminders, etc.
switch mode <arg>
- switch the current mode of the smart glasses app.
Currently available modes:
- live life captions
- blank screen
- social mode
Closed captions of everything you and those around you say. Live view of commands and commmand output. Nouns in transcripts are highlighted. Soon to be extended to give definition, summary, encylopedia, and NLP functionalities.
A social-emotional intelligence tool to be used in social situations. Live metrics about the social environment (eye-contact, facial emotion, high-level psychological metrics (e.g. stress, confidence, etc.)) overlaid on the users vision. Soon to extended with facial recognition (tie in to memory stream "Personal Person Database"), amalgamation (combine social information about everyone in the room), more metrics (drowsiness, believability (both ways), interest, etc.).
Blanks out the screen, sleep mode.
To show your ASG or ASP screen to others (e.g. over video chat with "share screen") you can open a window on your computer that will mirror the ASG or ASP display. The steps to do so are:
- Install
scrcpy
: https://github.com/Genymobile/scrcpy - Run
scrcpy
We use Vosk for automatic speech recognition (ASR). This is because the system is high accuracy, runs locally on Android, and is almost completely open source. The audio data is streamed from the ASG microphone (connected Bluetooth SCO microphone) to the ASP, where's it transcribed by Vosk.
Audio streaming from ASG - android_smart_glasses/.../AudioSystem.java
Audio receiving on ASP - android_smart_phone/.../comms/AudioSystem.java
Vosk Speech recognition system - android_smart_phone/.../speechrecognition/SpeechRecVosk.java
In case it wasn't clear - all speech recognition runs locally, with no audio streamed over the internet.
This app runs on any Android 9+ smart phone. We recommend significant computing power for the ASP, something like a Snapdragon 855+ or better, and something that supports WiFi sharing.
android_smart_phone/main/
is the main Android application. Open and run this in Android Studio
mediapipe/ is the mobile/edge AIML system, which is a (stickied) fork of the Google MediaPipe library and Android example program. The holistic android app and holisitic graph have been extended to include a number of new neural networks and processes.
If you want to edit the application, go here: main/
If you want to edit mediapipe library, go here: mediapipe/
You can either use the officially released APK (on Github or <emexwearables.com>) or build your own locally after following the instructions in the main README.md and then the instruction below:
Open, build, and run the app in main/
from Android Studio, just like any other Android app.
- Follow these instructions to setup Bazel and MediaPipe: https://google.github.io/mediapipe/getting_started/android.html (including the external link on this page on how to install MediaPipe)
- don't forget to follow these instructions on that same page: https://google.github.io/mediapipe/getting_started/install.html
- Change the SDK and NDK in ./main/WORKSPACE to point to your own Android SDK install (if you don't have one, install Android Studio and download an SDK and NDK)
- Run this command:
bazel build -c opt --config=android_arm64 --java_runtime_version=1.8 --noincremental_dexing --verbose_failures mediapipe/examples/android/src/java/com/google/mediapipe/apps/wearableai:wearableai;
- You have now built the application!
- For subsequent builds where you don't change anything in WORKSPACE file, use the following command for faster build:
bazel build -c opt --config=android_arm64 --java_runtime_version=1.8 --noincremental_dexing --verbose_failures --fetch=false mediapipe/examples/android/src/java/com/google/mediapipe/apps/wearableai:wearableai;
The system uses JSON IPC between the ASP and ASG.
MainActivity.java
- in charge of the UI, launching the background service
WearableAiAspService.java
- where everything happens. This launches connections, moves data around, and stays alive in the background.
ASGRepresentative.java
- a system that communicates with the ASG
GLBOXRepresentative.java
- a system that communicates with the GLBOX
Data and function calls are passed around the application on an event bus. Right now, we are using RXJAVA as the event bus, with our own custom parsing, and all event keys can be found in /commes/MessageTypes.java
.
Instead of calling functions directly, which requires passing many objects around and becomes too complex with a big system like this, we only pass around the "dataObservable" rxjava object, which handles sending data and triggering messages anywhere in the app. These events are multicast, so multiple different systems can respond to the same message.
- Soon, we'll move RXJAVA to Android EventBus
For right now, the number we send references to is hardcoded in nlp/WearableReferencerAutocite.java. Change the line that says "PUT NUMBER HERE" to contain the phone number you wish to send to. This will soon be changed to be user-supplied value in the ASP ui.
Further, for now, to add references to your database, update the Csv in assets/assets/wearable_referencer_references.csv. This also will be moved to a user-facing ui shortly.
There may be some issues with hard links to Android Studio executables in some Bazel configs in main/ app. We are working on making everything easy for dev setup, but if you get weird errors on running the above command, try to change hard links to point to your android studio/android sdk/android ndk, then make an issue on Github or reach out to cayden@emexwearables.com
We are using the vosk-api for local speech recognition.
Since we use our own model, we have our own android library with an assets folder that holds the model, and we import that android library as a dependency in our main app.
We can (obviously) use the main Vosk android model: vosk-model-small-en-us-0.15
However we have successfully tested using both vosk-model-en-us-0.22
and vosk-model-en-us-0.22-lgraph
. The problem with vosk-model-en-us-0.22
is that it makes the build time ~10 minutes, which is unreasonable. For now we will use vosk-model-en-us-0.22-lgraph
, and later we will change the system to automatically download vosk-model-en-us-0.22
from our server so the model doesn't have to be packed into the APK, but we can get the best recognition possible. We may also want to give users option on which to use, as older/slow devices may have a hard time with the month 2Gb+ model that is vosk-model-en-us-0.22
.
Google MediaPipe is way to define intellgience graphs ("perception pipelines") which take input, do intelligence processing (by creating flow of data between machine learning models and hard coded functions known as "Calculators"). This app is built on the Google MediaPipe even though ./main/ is not currently tracking Google MediaPipe repo. In the future, if we want to pull in new work from the main MediaPipe repository, we will set things up again to track Google Mediapipe.
Keras-VGG16-places365/ is the places365 system converted to a tensorflowlite model for our WearableAI graph that is currently running on the ASP
- save face rec bounding box in database and in face encoding object
- and display cropped face in face rec ui, so we can tag multiple multiple people
- Build system taken from MediaPipe
- Facial recognition from: https://github.com/shubham0204/FaceRecognition_With_FaceNet_Android
Cayden Pierce - emexwearables.com